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METHOD TO SCREEN PHAGE DISPLAY LIBRARIES WITH DIFFERENT 
tfR 0 9 *J LIGANDS 

The present invention relates to methods for selecting repertoires of polypeptides using 
generic and target ligands. In particular, the invention describes a method for selecting 
repertoires of antibody polypeptides with generic ligand to isolate functional subsets 
thereof. 



10 Introduction 



The antigen binding domain of an antibody comprises two separate regions: a heavy chain 
variable domain (Vh) and a light chain variable domain (Vl: which can be either V K or 
Vx). The antigen binding site itself is formed by six polypeptide loops: three from Vh 

15 domain (HI, H2 and H3) and three from Vl domain (LI, L2 and L3). A diverse primary 
repertoire of V genes that encode the Vh and Vl domains is produced by the 
combinatorial rearrangement of gene segments. The Vh gene is produced by the 
recombination of three gene segments, Vh, D and Jh. In humans, there are approximately 
51 functional Vh segments (Cook and Tomlinson (1995) Immunol Today, 16: 237), 25 

20 functional D segments (Corbett et al (1997) J. Mol Biol, 268: 69) and 6 functional Jh 
segments (Ravetch et al. (1981) Cell, 27: 583), depending on the haplotype. The Vh 
segment encodes the region of the polypeptide chain which forms the first and second 
antigen binding loops of the Vh domain (HI and H2), whilst the Vh, D and Jh segments 
combine to form the third antigen binding loop of the Vh domain (H3). The Vl gene is 

25 produced by the recombination of only two gene segments, Vl and Jl. In humans, there 
are approximately 40 functional V K segments (Schable and Zachau (1993) Biol Chem. 
Hoppe-Seyler, 374: 1001), 31 functional Vx segments (Williams et al (1996) 1 Mol 
Biol, 264: 220; Kawasaki et al (1997) Genome Res., 7: 250), 5 functional J K segments 
(Hieter et al (1982) J. Biol Chem., 257: 1516) and 4 functional Jx, segments (Vasicek 

30 and Leder (1990) /. Exp. Med., 172: 609), depending on the haplotype. The Vl segment 
encodes the region of the polypeptide chain which forms the first and second antigen 
binding loops of the Vl domain (LI and L2), whilst the Vl and Jl segments combine to 
form the third antigen binding loop of the Vl domain (L3). Antibodies selected from this 
primary repertoire are believed to be sufficiently diverse to bind almost all antigens with 

35 at least moderate affinity. High affinity antibodies are produced by "affinity maturation" 
of the rearranged genes, in which point mutations are generated and selected by the 
immune system on the basis of improved binding. 
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Analysis of the structures and sequences of antibodies has shown that five of the six 
antigen binding loops (HI, H2, LI, L2, L3) possess a limited number of main-chain 
conformations or canonical structures (Chothia and Lesk (1987) J. MoL Biol, 196: 901; 
Chothia et al (1989) Nature, 342: 877). The main-chain conformations are determined by 
5 (i) the length of the antigen binding loop, and (ii) particular residues, or types of residue, 
at certain key position in the antigen binding loop and the antibody framework. Analysis 
of the loop lengths and key residues has enabled us to the predict the main-chain 
conformations of HI, H2, LI, L2 and L3 encoded by the majority of human antibody 
sequences (Chothia et al (1992) J. MoL Biol, 227: 799; Tomlinson et al (1995) EMBO 

10 14: 4628; Williams et al (1996) J. MoL Biol, 264: 220). Although the H3 region is 
much more diverse in terms of sequence, length and structure (due to the use of D 
segments), it also forms a limited number of main-chain conformations for short loop 
lengths which depend on the length and the presence of particular residues, or types of 
residue, at key positions in the loop and the antibody framework (Martin et al (1996) J. 

1 5 MoL Biol, 263: 800; Shirai et al (1996) FEBS Letters, 399: 1). 

A similar analysis of side-chain diversity in human antibody sequences has enabled the 
separation of the pattern of sequence diversity in the primary repertoire from that created 
by somatic hypermutation. It was found that the two patterns are complementary: diversity 

20 in the primary repertoire is focused at the centre of the antigen binding whereas somatic 
hypermutation spreads diversity to regions at the periphery that are highly conserved in 
the primary repertoire (Tomlinson et al (1996) J. MoL Biol, 256: 813; Ignatovich et al 
(1997) /. MoL Biol 268: 69). This complementarity seems to have evolved as an efficient 
strategy for searching sequence space, given the limited number B cells available for 

25 selection at any given time. Thus, antibodies are first selected from the primary repertoire 
based on diversity at the centre of the binding site. Somatic hypermutation is then left to 
optimise residues at the periphery without disrupting favourable interactions established 
during the primary response. 

30 The recent advent of phage-display technology (Smith (1985) Science, 228: 1315; Scott 
and Smith (1990) Science, 249: 386; McCafferty et al (1990) Nature, 348: 552) has 
enabled the in vitro selection of human antibodies against a wide range of target antigens 
from "single pot" libraries. These phage-antibody libraries can be grouped into two 
categories: natural libraries which use rearranged V genes harvested from human B cells 

35 (Marks et al (1991) J. MoL Biol, 222: 581; Vaughan et al (1996) Nature Biotech., 14: 
309) or synthetic libraries whereby germline V gene segments are 'rearranged' in vitro 
(Hoogenboom & Winter (1992) J. MoL Biol, 227: 381; Nissim et al (1994) EMBO J., 
13: 692; Griffiths et al (1994) EMBO J., 13: 3245; De Kruif et al (1995) MoL Biol, 
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248: 97) or where synthetic CDRs are incorporated into a single rearranged V gene 
(Barbas et al (1992) Proc. Natl. Acad. Set USA, 89: 4457). Although synthetic libraries 
help to overcome the inherent biases of the natural repertoire which can limit the effective 
size of phage libraries constructed from rearranged V genes, they require the use of long 
5 degenerate PCR primers which frequently introduce base-pair deletions into the 
assembled V genes. This high degree of randomisation may also lead to the creation of 
antibodies which are unable to fold correctly and are also therefore non-functional. 
Furthermore, antibodies selected from these libraries may be poorly expressed and, in 
many cases, will contain framework mutations that may effect the antibodies 
10 immunogenicity when used in human therapy. 

Recently, in an extension of the synthetic library approach it has been suggested 
(WO97/08320, Morphosys) that human antibody frameworks can be pre-optimised by 
synthesising a set of 'master genes' that have consensus framework sequences and 

15 incorporate amino acid substitutions shown to improve folding and expression. Diversity 
in the CDRs is then incorporated using oligonucleotides. Since it is desirable to produce 
artificial human antibodies which will not be recognised as foreign by the human immune 
system, the use of consensus frameworks which, in most cases, do not correspond to any 
natural framework is a disadvantage of this approach. Furthermore, since it is likely that 

20 the CDR diversity will also have an effect on folding and/or expression, it is preferable to 
optimise the folding and/or expression (and remove any frame-shifts or stop codons) after 
the V gene has been fully assembled. To this end, it would be desirable to have a selection 
system which could eliminate non-functional or poorly folded/expressed members of the 
library before selection with the target antigen is carried out. 

25 

A further problem with the libraries of the prior art is that, because the main-chain 
conformation is heterogeneous, three-dimensional structural modelling is difficult because 
suitable high resolution crystallographic data may not be available. This is a particular 
problem for the H3 region, where the vast majority of antibodies derived from natural or 
30 synthetic antibody libraries have medium length or long loops and therefore cannot be 
modelled. 

Summary of the Invention 

35 According to the first aspect of the present invention, there is provided a method for 
selecting, from a repertoire of polypeptides, a population of functional polypeptides which 
bind a target ligand in a first binding site and a generic ligand in a second binding site, 
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which generic ligand is capable of binding functional members of the repertoire regardless 
of target ligand specificity, comprising the steps of: 

a) contacting the repertoire with the generic ligand and selecting functional 
polypeptides bound thereto; and 
5 b) contacting the selected functional polypeptides with the target ligand and 

selecting a population of polypeptides which bind to the target ligand. 

The invention accordingly provides a method by which a repertoire of polypeptides is 
preselected, according to functionality as determined by the ability to bind the generic 
10 ligand, and the subset of polypeptides obtained as a result of preselection is then 
employed for further rounds of selection according to the ability to bind the target ligand. 
Although, in a preferred embodiment, the repertoire is first selected with the generic 
ligand, it will be apparent to one skilled in the art that the repertoire may be contacted 
with the ligands in the opposite order, i.e. with the target ligand before the generic ligand. 

15 

The invention permits the person skilled in the art to remove, from a chosen repertoire of 
polypeptides, those polypeptides which are non- functional, for example as a result of the 
introduction of frame-shift mutations, stop codons, folding mutants or expression mutants 
which would be or are incapable of binding to substantially any target ligand. Such non- 
20 functional mutants are generated by the normal randomisation and variation procedures 
employed in the construction of polypeptide repertoires. At the same time the invention 
permits the person skilled in the art to enrich a chosen repertoire of polypeptides for those 
polypeptides which are functional, well folded and highly expressed. 

25 Preferably, two or more subsets of polypeptides are obtained from a repertoire by the 
method of the invention, for example, by prescreening the repertoire with two or more 
generic ligands, or by contacting the repertoire with the generic ligand(s) under different 
conditions. Advantageously, the subsets of polypeptides thus obtained are combined to 
form a further repertoire of polypeptides, which may be further screened by contacting 

30 with target and/or generic ligands. 

Preferably, the library according to the invention comprises polypeptides of the 
immunoglobulin superfamily, such as antibody polypeptides or T-cell receptor 
polypeptides. Advantageously, the library may comprise individual immunoglobulin 
35 domains, such as the V H or V L domains of antibodies, or the Vp or V a domains of T-cell 
receptors. In a preferred embodiment, therefore, repertoires of, for example, Vh and V L 
polypeptides may be individually prescreened using a generic ligand and then combined 
to produce a functional repertoire comprising both V H and Vl polypeptides. Such a 
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repertoire can then be screened with a target ligand in order to isolate polypeptides 
comprising both V H and V L domains and having the desired binding specificity. 

In an advantageous embodiment, the generic ligand selected for use with immunoglobulin 
5 repertoires is a superantigen. Superantigens are able to bind to functional immunoglobulin 
molecules, or subsets thereof comprising particular main-chain conformations, 
irrespective of target ligand specificity. Alternatively, generic ligands may be selected 
from any ligand capable of binding to the general structure of the polypeptides which 
make up any given repertoire, such as antibodies themselves, metal ion matrices, organic 
10 compounds including proteins or peptides, and the like. 

In a second aspect, the invention provides a library wherein the functional members have 
binding sites for both generic and target ligands. Libraries may be specifically designed 
for this purpose, for example by constructing antibody libraries having a main-chain 
15 conformation which is recognised by a given superantigen, or by constructing a library in 
which substantially all potentially functional members possess a structure recognisable by 
a antibody ligand. 

In a third aspect, the invention provides a method for detecting, immobilising, purifying 
20 or immunoprecipitating one or more members of a repertoire of polypeptides previously 
selected according to the invention, comprising binding the members to the generic 
ligand. 

In a fourth aspect, the invention provides a library comprising a repertoire of polypeptides 
25 of the immunoglobulin superfamily, wherein the members of the repertoire have a known 
main-chain conformation. 

In a fifth aspect, the invention provides a method for selecting a polypeptide having a 
desired generic and/or target ligand binding site from a repertoire of polypeptides, 
30 comprising the steps of: 

a) expressing a library according to the preceding aspects of the invention; 

b) contacting the polypeptides with generic and/or target ligands and selecting 
those which bind the generic and/or target ligand; and 

c) optionally amplifying the selected polypeptide(s) which bind the generic and/or 
35 target ligand. 

d) optionally repeating steps a) - c). 
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Repertoires of polypeptides are advantageously both generated and maintained in the form 
of a nucleic acid library. Therefore, in a sixth aspect, the invention provides a nucleic acid 
library encoding a repertoire of such polypeptides. 

5 Brief Description of the Figures 

Figure 1 : Bar graph indicating positions in the Vh and V K regions of the human antibody 
repertoire which exhibit extensive natural diversity and make antigen contacts (see 
Tomlinson et al (1996) J. Mol BioL, 256: 813). The H3 and the end of L3 are not shown 
10 in this representation although they are also highly diverse and make antigen contacts. 
Although sequence diversity in the human lambda genes has been thoroughly 
characterised (see Ignatovich et al (1997) J. Mol Biol 268: 69) very little data on antigen 
contacts currently exists for three-dimensional lambda structures. 

15 Figure 2: Sequence of the scFv that forms the basis of a library according to the 
invention. There are currently two versions of the library: a "primary" library wherein 18 
positions are varied and a "somatic" library wherein 12 positions are varied. The six loop 
regions HI, H2, H3, LI, L2 and L3 are indicated. CDR regions as defined by Kabat 
(Kabat et al (1991). Sequences of proteins of immunological interest, U.S. Department of 

20 Health and Human Services) are underlined. 

Figure 3: Analysis of functionality in a library according to the invention before and after 
selecting with the generic ligands Protein A and Protein L. Here Protein L is coated on an 
ELISA plate, the scFv supernatants are bound to it and detection of scFv binding is with 
25 Protein A-HRP. Therefore, only those scFv capable of binding both Protein A and Protein 
L give an ELISA signal. 

Figure 4: Sequences of clones selected from libraries according to the invention, after 
panning with bovine ubiquitin, rat BIP, bovine histone, NIP-BSA, FITC-BSA, human 
30 leptin, human thyroglobulin, BSA, hen egg lysozyme, mouse IgG and human IgG. 
Underlines in the sequences indicate the positions which were varied in the respective 
libraries. 

Figure 5: 5a: Comparison of scFv concentration produced by the unselected and 
35 preselected "primary" DVT libraries in host cells. 5b: standard curve of ELISA as 
determined from known standards. 
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Figure 6: Western blot of phage from preselected and unselected DVT "primary" 
libraries, probed with an anti-phage pIU antibody in order to determine the percentage of 
phage bearing scFv. 

5 Detailed Description of the Invention 

Definitions 

Repertoire A repertoire is a population of diverse variants, for example nucleic acid 
10 variants which differ in nucleotide sequence or polypeptide variants which differ in amino 
acid sequence. A library according to the invention will encompass a repertoire of 
polypeptides or nucleic acids. According to the present invention, a repertoire of 
polypeptides is designed to possess a binding site for a generic ligand and a binding site 
for a target ligand. The binding sites may overlap, or be located in the same region of the 
1 5 molecule, but their specificities will differ. 

Organism As used herein, the term "organism" refers to all cellular life-forms, such 
as prokaryotes and eukaryotes, as well as non-cellular, nucleic acid-containing entities, 
such as bacteriophage and viruses. 

20 

Functional As used herein, the term "functional" refers to a polypeptide which 
possesses either the native biological activity of the naturally-produced proteins of its 
type, or any specific desired activity, for example as judged by its ability to bind to ligand 
molecules, defined below. Examples of "functional" polypeptides include an antibody 

25 binding specifically to an antigen through its antigen-binding site, a receptor molecule 
(e.g. a T-cell receptor) binding its characteristic ligand and an enzyme binding to its 
substrate. In order for a polypeptide to be classified as functional according to the 
invention, it follows that it first must be properly processed and folded so as to retain its 
overall structural integrity, as judged by its ability to bind the generic ligand, also defined 

30 below. 

For the avoidance of doubt, functionality is not equivalent to the ability to bind the target 
ligand. For instance, a functional anti-CEA monoclonal antibody will not be able to bind 
specifically to target ligands such as bacterial LPS. However, because it is capable of 
35 binding a target ligand (i.e. it would be able bind to CEA if CEA were the target ligand) it 
is classed as a "functional" antibody molecule and may be selected by binding to a generic 
ligand, as defined below. Typically, non-functional antibody molecules will be incapable 
of binding to any target ligand. 
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Generic ligand A generic ligand is a ligand that binds a substantial proportion of 
functional members in a given repertoire. Thus, the same generic ligand can bind many 
members of the repertoire regardless of their target ligand specificities (see below). In 
5 general, the presence of functional generic ligand binding site indicates that the repertoire 
member is expressed and folded correctly. Thus, binding of the generic ligand to its 
binding site provides a method for preselecting functional polypeptides from a repertoire 
of polypeptides. 

10 Target Ligand The target ligand is a ligand for which a specific binding member or 
members of the repertoire is to be identified. Where the members of the repertoire are 
antibody molecules, the target ligand may be an antigen and where the members of the 
repertoire are enzymes, the target ligand may be a substrate. Binding to the target ligand is 
dependent upon both the member of the repertoire being functional, as described above 

15 under generic ligand, and upon the precise specificity of the binding site for the target 
ligand. 

Subset The subset is a part of the repertoire. In the terms of the present invention, it is 
often the case that only a subset of the repertoire is functional and therefore possesses a 
20 functional generic ligand binding site. Furthermore, it is also possible that only a fraction 
of the functional members of a repertoire (yet significantly more than would bind a given 
target ligand) will bind the generic ligand. These subsets are able to be selected according 
to the invention. 

25 Subsets of a library may be combined or pooled to produce novel repertoires which have 
been preselected according to desired criteria. Combined or pooled repertoires may be 
simple mixtures of the polypeptide members preselected by generic ligand binding, or 
may be manipulated to combine two polypeptide subsets. For example, V H and V L 
polypeptides may be individually prescreened, and subsequently combined at the genetic 

30 level onto single vectors such that they are expressed as combined V H -V L dimers, such as 
scFv. 

Library The term library refers to a mixture of heterogeneous polypeptides or nucleic 
acids. The library is composed of members, which have a single polypeptide or nucleic 
35 acid sequence. To this extent, library is synonymous with repertoire. Sequence 
differences between library members are responsible for the diversity present in the 
library. The library may take the form of a simple mixture of polypeptides or nucleic 
acids, or may be in the form organisms or cells, for example bacteria, viruses, animal or 
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plant cells and the like, transformed with a library of nucleic acids. Preferably, each 
individual organism or cell contains only one member of the library. Advantageously, the 
nucleic acids are incorporated into expression vectors, in order to allow expression of the 
polypeptides encoded by the nucleic acids. In a preferred aspect, therefore, a library may 
5 take the form of a population of host organisms, each organism containing one or more 
copies of an expression vector containing a single member of the library in nucleic acid 
form which can be expressed to produce its corresponding polypeptide member. Thus, the 
population of host organisms has the potential to encode a large repertoire of genetically 
diverse polypeptide variants. 

10 

Immunoglobulin superfamily This refers to a family of polypeptides which retain the 
immunoglobulin fold characteristic of immunoglobulin (antibody) molecules, which 
contains two p sheets and, usually, a conserved disulphide bond. Members of the 
immunoglobulin superfamily are involved in many aspects of cellular and non-cellular 

15 interactions in vivo, including widespread roles in the immune system (for example, 
antibodies, T-cell receptor molecules and the like), involvement in cell adhesion (for 
example the ICAM molecules) and intracellular signalling (for example, receptor 
molecules, such as the PDGF receptor). The present invention is applicable to all 
immunoglobulin superfamily molecules, since variation therein is achieved in similar 

20 ways. Preferably, the present invention relates to immunoglobulins (antibodies). 

Main-chain conformation The main-chain conformation refers to the Ca backbone trace 
of a structure in three-dimensions. When individual hypervariable loops of antibodies or 
TCR molecules are considered the main-chain conformation is synonymous with the 

25 canonical structure. As set forth in Chothia and Lesk (1987) J. Mol Biol, 196: 901 and 
Chothia et al (1989) Nature, 342: 877, antibodies display a limited number of canonical 
structures for five of their six hypervariable loops (HI, H2, LI, L2 and L3), despite 
considerable side-chain diversity in the loops themselves. The precise canonical structure 
exhibited depends on the length of the loop and the identity of certain key residues 

30 involved in its packing. The sixth loop (H3) is much more diverse in both length and 
sequence and therefore only exhibits canonical structures for certain short loop lengths 
(Martin et al (1996) 7. Mol Biol, 263: 800; Shirai et al (1996) FEBS Letters, 399: 1). In 
the present invention, all six loops will preferably have canonical structures and hence the 
main-chain conformation for the entire antibody molecule will be known. 

35 

Antibody polypeptide Antibodies are immunoglobulins that are produced by B cells and 
form a central part of the host immune defence system in vertebrates. An antibody 
polypeptide, as used herein, is a polypeptide which either is an antibody or is a part of an 
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antibody, modified or unmodified. Thus, the term antibody polypeptide includes a heavy 
chain, a light chain, a heavy chain-light chain dimer, a Fab fragment, a F(ab ? )2 fragment, a 
Dab fragment, or an Fv fragment, including a single chain Fv (scFv). Methods for the 
construction of such antibody molecules are well known in the art. 

5 

Superantigen Superantigens are antigens, mostly in the form of toxins expressed in 
bacteria, which interact with members of the immunoglobulin superfamily outside the 
conventional ligand binding sites for these molecules. Staphylococcal enterotoxins 
interact with T-cell receptors and have the effect of stimulating CD4+ T-cells. 
10 Superantigens for antibodies include the molecules Protein G that binds the IgG constant 
region (Bjorck and Kronvall (1984) J. Immunol, 133: 969; Reis et al (1984) J. Immunol, 
132: 3091), Protein A that binds the the IgG constant region and the V H domain (Forsgren 
and Sjoquist (1966) J. Immunol, 97: 822) and Protein L that binds the V L domain (Bjorck 
(1988) J. Immunol, 140: 1994). 

15 

Preferred Embodiments of the Invention 

The present invention provides a selection system which eliminates (or significantly 
reduces the proportion of) non-functional or poorly folded/expressed members of a 

20 polypeptide library whilst enriching for functional, folded and well expressed members 
before a selection for specificity against a "target ligand" is carried out. A repertoire of 
polypeptide molecules is contacted with a "generic ligand", a protein that has affinity for a 
structural feature common to all functional, for example complete and/or correctly folded, 
proteins of the relevant class. Note that the term "ligand" is used broadly in reference to 

25 molecules of use in the present invention. As used herein, the term "ligand" refers to any 
entity that will bind to or be bound by a member of the polypeptide library. 

A significant number of defective proteins present in the initial repertoire fail to bind the 
generic ligand and are thereby eliminated. This selective removal of non-functional 
30 polypeptides from a library results in a marked reduction in its actual size, while its 
functional size is maintained, with a corresponding increase in its quality. Polypeptides 
which are retained by virtue of binding the generic ligand constitute a 'first selected pool' 
or 'subset* of the original repertoire. Consequently, this 'subset' is enriched for functional, 
well folded and well expressed members of the initial repertoire. 

35 

The polypeptides of the first selected pool or subset are subsequently contacted with at 
least one "target ligand", which binds to polypeptides with a given functional specificity. 
Such target ligands include, but are not limited to, either half of a receptor/ligand pair 
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(e.g. a hormone or other cell-signalling molecule, such as a neurotransmitter, and its 
cognate receptor), either of a binding pair of cell adhesion molecules, a protein substrate 
that is bound by the active site of an enzyme, a protein, peptide or small organic 
compound against which a particular antibody is to be directed or even an antibody itself. 
5 Consequently, the use of such a library is less labour-intensive and more economical, in 
terms of both time and materials, than is that of a conventional library. In addition, since, 
compared to a repertoire which has not been selected with a generic ligand, the first 
selected pool will contain a much higher ratio of molecules able to bind the target ligand 
to those that are unable to bind the target ligand, there will be a significant reduction of 
10 background during selection with the "target ligand". 

Combinatorial selection schemes are also contemplated according to the invention. 
Multiple selections of the same initial polypeptide repertoire can be performed in parallel 
or in series using different generic and/or target ligands. Thus, the repertoire can first be 

15 selected with a single generic ligand and then subsequently selected in parallel using 
different target ligands. The resulting subsets can then be used separately or combined, in 
which case the combined subset will have a range of target ligand specificities but a single 
generic ligand specificity. Alternatively, the repertoire can first be selected with a single 
target ligand and then subsequently selected in parallel using different generic ligands. 

20 The resulting subsets can then be used separately or combined, in which case the 
combined subset will have a range of generic ligand specificities but a single target ligand 
specificity. The use of more elaborate schemes are also envisaged. For example, the initial 
repertoire can be subjected to two rounds of selection using two different generic ligands, 
followed by selection with the target ligand. This produces a subset in which all members 

25 bind both generic ligands and the target ligand. Alternatively, if the selection of the initial 
repertoire with the two generic ligands is performed in parallel and the resulting subsets 
combined and then selected with the target ligand the resulting subset binds at least one of 
the two generic ligands and the target ligand. Combined or pooled repertoires may be 
simple mixtures of the subsets or may be manipulated to physically link the subsets. For 

30 example, Vh and Vl polypeptides may be individually selected in parallel by binding two 
different generic ligands, and subsequently combined at the genetic level onto single 
vectors such that they are expressed as combined V H -V L . This repertoire can then be 
selected against the target ligand such that the selected members able to bind both generic 
ligands and the target ligand. 

35 

The invention encompasses libraries of functional polypeptides selected or selectable by 
the methods broadly described above, as well as nucleic acid libraries encoding 
polypeptide molecules which may be used in a selection performed according to these 
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methods (preferably, molecules which comprise a first binding site for a target ligand and 
a second binding site for a generic ligand). In addition, the invention provides methods for 
detecting, immobilising, purifying or immunoprecipitating one or more members of a 
repertoire of functional polypeptides selected using the generic or target ligands according 
5 to the invention. 

The invention is particularly applicable to the enrichment of libraries of molecules of the 
immunoglobulin superfamily. This is particularly true as regards the generation of 
populations of antibodies and T-cell receptors which are functional and have a desired 

10 specificity, as is required for use in diagnostic, therapeutic or prophylactic procedures. To 
this end, the invention provides antibody and T-cell receptor libraries wherein all the 
members have both natural frameworks and loops of known main-chain conformation, as 
well as strategies for useful mutagenesis of the starting sequence and the subsequent 
selection of functional variants so generated. Such polypeptide libraries may comprise Vh 

15 or Vp domains or, alternatively, it may comprise Vl or V a domains, or even both Vh or 
Vp and V L or V a domains. 

There is significant need in the art for improved libraries of antibody or T-cell receptor 
molecules. For example, despite progress in the creation of "single pot" phage-antibody 

20 libraries, several problems still remain. Natural libraries (Marks et al (1991) J. MoL Biol, 
222: 581; Vaughan et al (1996) Nature Biotech., 14: 309) which use rearranged V genes 
harvested from human B cells are highly biased due to the positive and negative selection 
of the B cells in vivo. This can limit the effective size of phage libraries constructed from 
rearranged V genes. In addition, clones derived from natural libraries invariably contain 

25 framework mutations which may effect the antibodies immunogenicity when used in 
human therapy. Synthetic libraries (Hoogenboom & Winter (1992) J. MoL Biol., 221 \ 
381; Barbas et al (1992) Proc. Natl Acad. ScL USA, 89: 4457; Nissim et al. (1994) 
EMBO J., 13: 692; Griffiths et al (1994) EMBO J., 13: 3245; De Kruif et al (1995) J. 
MoL Biol., 248: 97) can overcome the problem of bias but they require the use of long 

30 degenerate PCR primers which frequently introduce base-pair deletions into the 
assembled V genes. This high degree of randomisation may also lead to the creation of 
antibodies which are unable to fold correctly and are also therefore non- functional. In 
many cases it is likely that these non-functional members will outnumber the functional 
members in a library. Even if the frameworks can be pre-optimised for folding and/or 

35 expression (WO97/08320, Morphosys) by synthesising a set of 'master genes' with 
consensus framework sequences and by incorporating amino acid substitutions shown to 
improve folding and expression, there remains the problem of immunogenicity since, in 
most cases, the consensus sequences do not correspond to any natural framework. 
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Furthermore, since it is likely that the CDR diversity will also have an effect of folding 
and/or expression, it is preferable to optimise the folding and/or expression (and remove 
any frame-shifts or stop codons) after the V gene has been fully assembled. 

5 A further problem with existing libraries is that because the main-chain conformation is 
heterogeneous, three-dimensional structural modelling is difficult because suitable high 
resolution crystallographic data may not be available. This is a particular problem for the 
H3 region, where the vast majority of antibodies derived from natural or synthetic 
antibody libraries have medium length or long loops and therefore cannot be modelled. 

10 

Another problem with existing libraries is the reliance on epitope tags (such as the myc, 
FLAG or HIS tags) for detection of expressed antibody fragments. As these are usually 
located at the N or C terminal ends of the antibody fragment they tend to be prone to 
proteolytic cleavage. Superantigens, such as Protein A and Protein L can be used to detect 
15 expressed antibody fragments by binding the folded domains themselves but since they 
are V H and V L family specific, only a relatively small proportion of members of any 
existing antibody library will bind one of these reagents and an even smaller proportion 
will bind to both. 

20 To this end, it would be desirable to have a selection system which could eliminate (or at 
least reduce the proportion of) non- functional or poorly folded/expressed members of the 
library before selection against the target antigen is carried out whilst enriching for 
functional, folded and well expressed members all of which are able to bind generic 
ligands such as the superantigens Protein A and Protein L. In addition, it would be 

25 advantageous to construct an antibody library wherein all the members have natural 
frameworks and have loops with known main-chain conformations. 

The invention accordingly provides a method by which a polypeptide repertoire may be 
selected to remove non-functional members. This results in a marked reduction in the 

30 actual library size (and a corresponding increase in the quality of the library) without 
reducing the functional library size. The invention also provides a method for creating 
new polypeptide repertoires wherein all the functional members are able to bind a given 
generic ligand. The same generic ligand can be used for the subsequent detection, 
immobilisation, purification or immunoprecipitation of any one or more members of the 

35 repertoire. 

Any 'naive' or 'immune' antibody repertoire can be used with the present invention to 
enrich for functional members and/or to enrich for members that bind a given generic 
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ligand or ligands. Indeed, since only a small percentage of all human germline Vjj 
segments bind Protein A with high affinity and only a small percentage of all human 
germline Vl segments bind Protein L with high affinity preselection with these 
superantigens is highly advantageous. Alternatively, pre-selection with via the epitope tag 
5 enables non-functional variants to be removed from synthetic libraries. The libraries that 
are amenable to preselection include, but are not limited to, libraries comprised of V 
genes rearranged in vivo of the type described by Marks et al. (1991) J. Mol. Biol, 222: 
581 and Vaughan et al. (1996) Nature Biotech., 14: 309, synthetic libraries whereby 
germline V gene segments are 'rearranged* in vitro (Hoogenboom & Winter (1992) /. Mol 
10 Biol, 227: 381; Nissim et al (1994) EMBO J., 13: 692; Griffiths et al. (1994) EMBO 1, 
13: 3245; De Kruif et al (1995) J. Mol Biol, 248: 97) or where synthetic CDRs are 
incorporated into a single rearranged V gene (Barbas et al. (1992) Proc. Natl Acad. Sci. 
USA, 89: 4457) or into multiple master frameworks (WO97/08320, Morphosys). 

15 Selection of polypeptides according to the invention 

Once a diverse pool of polypeptides is generated, selection according to the invention is 
applied. Two broad selection procedures are based upon the order in which the generic 
and target ligands are applied; combinatorial variations on these schemes involve the use 
20 of multiple generic and/or target ligands in a given step of a selection. When a 
combinatorial scheme is used, the pool of polypeptide molecules may be contacted with, 
for example, several target ligands at once, or by each singly, in series; in the latter case, 
the resulting selected pools of polypeptides may be kept separate or may, themselves, be 
pooled. These selection schemes may be summarized as follows: 

25 

a. Selection procedure 1 : 

Initial polypeptide selection using the generic ligand 

In order to remove non-functional members of the library, a generic ligand is 

30 selected, such that the generic ligand is only bound by functional molecules. For example, 
the generic ligand may be a metallic ion, an antibody (in the form of a monoclonal 
antibody or a polyclonal mixture of antibodies), half of an enzyme/ligand complex or 
organic material; note that ligands of any of these types are, additionally or alternatively, 
of use as target ligands according to the invention. Antibody production and metal affinity 

35 chromatography are discussed in detail below. Ideally, these ligands bind a site (e.g. a 
peptide tag or superantigen binding site) on the members of the library which is of 
constant structure or sequence, which structure is liable to be absent or altered in non- 
functional members. In the case of antibody libraries, this method is of use to select from 



SUBSTITUTE SHEET 



15 

a library only those functional members which have a binding site for a given 
superantigen or monoclonal antibody; such an approach is useful in selecting functional 
antibody polypeptides from both natural and synthetic pools thereof. 

5 The superantigens Protein A and/or Protein L are of use in the invention as generic 
ligands to select antibody repertoires, since they bind correctly folded Vh and V L domains 
(which belong to certain V H and Vl families), respectively, regardless of the sequence and 
structure of the binding site for the target ligand. In addition, Protein A or another 
superantigen Protein G are of use as generic ligands to select for folding and/or expression 
10 by binding the heavy chain constant domains of antibodies. Anti-K and anti-X antibodies 
are also of use in selecting light chain constant domains. Small organic mimetics of 
antibodies or of other binding proteins, such as Protein A (Li et al. (1998) Nature 
Biotech., 16: 190), are also of use. 

When this selection procedure is used, the generic ligand, by its very nature, is able to 
bind all functional members of the preselected repertoire; therefore, this generic ligand (or 
some conjugate thereof) may be used to detect, immobilise, purify or immunoprecipitate 
any member or population of members from the repertoire (whether selected by binding a 
given target ligand or not, as discussed below). Protein detection via immunoassay 
techniques as well as immunoprecipitation of member polypeptides of a repertoire of the 
invention may be performed by the techniques discussed below with regard to the testing 
of antibody selection ligands of use in the invention (see "Antibodies for use as ligands in 
polypeptide selection"). Immobilization may be performed through specific binding of a 
polypeptide member of a repertoire to either a generic or target ligand according to the 
invention which is, itself, linked to a solid or semi-solid support, such as a filter (e.g. of 
nitrocellulose or nylon) or a chromatographic support (including, but not limited to, a 
cellulose, polymer, resin or silica support); covalent attachment of the member 
polypeptide to the generic or target ligand may be performed using any of a number of 
chemical crosslinking agents known to one of skill in the art. Immobilization on a metal 
affinity chromatography support is described below (see "Metallic ligands as use for the 
selection of polypeptides"). Purification may comprise any or a combination of these 
techniques, in particular immunoprecipitation and chromatography by methods well 
known in the art. 

35 Using this approach, selection with multiple generic ligands can be performed either one 
after another to create a repertoire in which all members bind two or more generic ligands, 
separately in parallel, such that the subsets can then be combined (in this case, members 
of the preselected repertoire will bind at least one of the generic ligands) or separately 



15 



20 



25 



30 



SUBSTITUTE SHEET 



16 

followed by incorporation into the same polypeptide chain whereby a large functional 
library in which all members may be able to bind all the generic ligands used during 
preselection. For example, subsets can be selected from one or more libraries using 
different generic ligands which bind heavy and light chains of antibody molecules (see 
5 below) and then combined to form a heavy/light chain library, in which the heavy and 
light chains are either non-covalently associated or are covalently linked, for example, by 
using V H and V L domains in a single-chain Fv context. 

Secondary polypeptide selection using the target ligand 

10 Following the selection step with the generic ligand, the library is screened in 

order to identify members that bind to the target ligand. Since it is enriched for functional 
polypeptides after selection with the generic ligand, there will be an advantageous 
reduction in non-specific ("background") binding during selection with the target ligand. 
Furthermore, since selection with the generic ligand produces a the marked reduction in 

1 5 the actual library size (and a corresponding increase in the quality of the library) without 
reducing the functional library size, a smaller repertoire should elicit the same diversity of 
target ligand specifities and affinities as the larger starting repertoire (that contained many 
non-functional and poorly folded/expressed members). 

20 One or more target ligands may be used to select polypeptides from the first selected 
polypeptide pool generated using the generic ligand. In the event that two or more target 
ligands are used to generate a number of different subsets, two or more of these subsets 
may be combined to form a single, more complex subset. A single generic ligand is able 
to bind every member of the resulting combined subset; however, a given target ligand 

25 binds only a subset of library members. 

b. Selection procedure 2: 

Initial selection of repertoire members with the target ligand 
30 Here, selection using the target ligand is performed prior to selection using the 

generic ligand. Obviously, the same set of polypeptides can result from either scheme, if 
such a result is desired. Using this approach, selection with multiple target ligands can be 
performed in parallel or by mixing the target ligands for selection. If performed in 
parallel, the resulting subsets may, if required, be combined. 

35 

Secondary polypeptide selection using the generic ligand 

Subsequent selection of the target ligand binding subset can then be performed using one 
or more generic ligands. Whilst this is not a selection for function, since members of the 
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repertoire that are able to bind to the target ligand are by definition functional, it does 
enable subsets that bind to different generic ligands to be isolated. Thus, the target ligand 
selected population can be selected by one generic ligand or by two or more generic 
ligands. In this case, the generic ligands can be used one after another to create a 
5 repertoire in which all members bind the target ligand and two or more generic ligands or 
separately in parallel, such that different (but possibly overlapping) subsets binding the 
target ligand and different generic ligands are created. These can then be combined (in 
this case, members will bind at least one of the generic ligands). 

10 Selection of immunoglobulin-family polypeptide library members 

The members of the repertoires or libraries selected in the present invention 
advantageously belong to the immunoglobulin superfamily of molecules, in particular, 
antibody polypeptides or T-cell receptor polypeptides. For antibodies, it is envisaged that 
the method according to this invention may be applied to any of the existing antibody 

15 libraries known in the art (whether natural or synthetic) or to antibody libraries designed 
specifically to be preselected with generic ligands (see below). 

Construction of libraries of the invention 

20 a. Selection of the main-chain conformation 

The members of the immunoglobulin superfamily all share a similar fold for their 
polypeptide chain. For example, although antibodies are highly diverse in terms of their 
primary sequence, comparison of sequences and crystallographic structures has revealed 
that, contrary to expectation, five of the six antigen binding loops of antibodies (HI, H2, 

25 LI, L2, L3) adopt a limited number of main-chain conformations, or canonical structures 
(Chothia and Lesk (1987) supra; Chothia et al (1989) supra). Analysis of loop lengths and 
key residues has therefore enabled prediction of the main-chain conformations of HI, H2, 
LI, L2 and L3 found in the majority of human antibodies (Chothia et al. (1992) supra; 
Tomlinson et al. (1995) supra; Williams et al (1996) supra). Although the H3 region, is 

30 much more diverse in terms of sequence, length and structure (due to the use of D 
segments), it also forms a limited number of main-chain conformations for short loop 
lengths which depend on the length and the presence of particular residues, or types of 
residue, at key positions in the loop and the antibody framework (Martin et al. (1996) 
supra; Shirai et al. (1996) supra). 

35 

According to the present invention, libraries of antibody polypeptides are designed in 
which certain loop lengths and key residues have been chosen to ensure that the main- 
chain conformation of the members is known. Advantageously, these are real 
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conformations of immunoglobulin superfamily molecules found in nature, to minimize 
the chances that they are non-functional, as discussed above. Germline V gene segments 
serve as one suitable basic framework for constructing antibody or T-cell receptor 
libraries; other sequences are also of use. Variations may occur at a low frequency, such 
5 that a small number of functional members may possess an altered main-chain 
conformation, which does not affect its function. 

Canonical structure theory is also of use in the invention to assess the number of different 
main-chain conformations encoded by antibodies, to predict the main-chain conformation 

10 based on antibody sequences and to chose residues for diversification which do not affect 
the canonical structure. It is now known that, in the human V K domain, the LI loop can 
adopt one of four canonical structures, the L2 loop has a single canonical structure and 
that 90% of human V K domains adopt one of four or five canonical structures for the L3 
loop (Tomlinson et aL (1995) supra); thus, in the V K domain alone, different canonical 

15 structures can combine to create a range of different main-chain conformations. Given 
that the Vx domain encodes a different range of canonical structures for the LI, L2 and L3 
loops and that V K and V*. domains can pair with any Vh domain which can encode several 
canonical structures for the HI and H2 loops, the number of canonical structure 
combinations observed for these five loops is very large. This implies that the generation 

20 of diversity in the main-chain conformation may be essential for the production of a wide 
range of binding specificities. However, by constructing an antibody library based on a 
single known main-chain conformation it was found, contrary to expectation, that 
diversity in the main-chain conformation is not required to generate sufficient diversity to 
target substantially all antigens. Even more surprisingly, the single main-chain 

25 conformation need not be a consensus structure - a single naturally occurring 
conformation can be used as the basis for an entire library. Thus, in a preferred aspect, the 
invention provides a library in which the members encode a single known main-chain 
conformation. It is to be understood, however, that occasional variations may occur such 
that a small number of functional members may possess an alternative main-chain 

30 conformation, which may be unknown. 

The single main-chain conformation that is chosen is preferably commonplace among 
molecules of the immunoglobulin superfamily type in question. A conformation is 
commonplace when a significant number of naturally occurring molecules are observed to 
35 adopt it. Accordingly, in a preferred aspect of the invention, the natural occurrence of the 
different main-chain conformations for each binding loop of an immunoglobulin 
superfamily molecule are considered separately and then a naturally occurring 
immunoglobulin superfamily molecule is chosen which possesses the desired combination 
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of main-chain conformations for the different loops. If none is available, the nearest 
equivalent may be chosen. Since a disadvantage of immunoglobulin-family polypeptide 
libraries of the prior art is that many members have unnatural frameworks or contain 
framework mutations (see above), in the case of antibodies or T-cell receptors, it is 
5 preferable that the desired combination of main-chain conformations for the different 
loops is created by selecting germline gene segments which encode the desired main- 
chain conformations. It is more preferable, that the selected germline gene segments are 
frequently expressed and most preferable that they are the most frequently expressed. 

10 In designing antibody libraries, therefore, the incidence of the different main-chain 
conformations for each of the six antigen binding loops may be considered separately. For 
HI, H2, LI, L2 and L3, a given conformation that is adopted by between 20% and 100% 
of the antigen binding loops of naturally occurring molecules is chosen. Typically, its 
observed incidence is above 35% (i.e. between 35% and 100%) and, ideally, above 50% 

15 or even above 65%. Since the vast majority of H3 loops do not have canonical structures, 
it is preferable to select a main-chain conformation which is commonplace among those 
loops which do display canonical structures. For each of the loops, the conformation 
which is observed most often in the natural repertoire is therefore selected. In human 
antibodies, the most popular canonical structures (CS) for each loop are as follows: HI - 

20 CS 1 (79% of the expressed repertoire), H2 - CS 3 (46%), LI - CS 2 of V K (39%), L2 - 
CS 1 (100%), L3 - CS 1 of V K (36%) (calculation assumes a k:X ratio of 70:30, Hood et 
al (1967) Cold Spring Harbor Symp. Quant. Biol, 48: 133). For H3 loops that have 
canonical structures, a CDR3 length (Kabat et aL (1991) Sequences of proteins of 
immunological interest, U.S. Department of Health and Human Services) of seven 

25 residues with a salt-bridge from residue 94 to residue 101 appears to be the most 
common. There are at least 16 human antibody sequences in the EMBL data library with 
the required H3 length and key residues to form this conformation and at least two 
crystallographic structures in the protein data bank which can be used as a basis for 
antibody modelling (2cgr and ltet). The most frequently expressed germline gene 

30 segments that this combination of canonical structures are the Vh segment 3-23 (DP-47), 
the J H segment JH4b, the V K segment 02/012 (DPK9) and the J K segment J K 1. These 
segments can therefore be used in combination as a basis to construct a library with the 
desired single main-chain conformation. 

35 Alternatively, instead of choosing the single main-chain conformation based on the 
natural occurrence of the different main-chain conformations for each of the binding loops 
in isolation, the natural occurrence of combinations of main-chain conformations is used 
as the basis for choosing the single main-chain conformation. In the case of antibodies, for 
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example, the natural occurrence of canonical structure combinations for any two, three, 
four, five or for all six of the antigen binding loops can be determined. Here, it is 
preferable that the chosen conformation is commonplace in naturally occurring antibodies 
and most preferable that it observed most frequently in the natural repertoire. Thus, in 
5 human antibodies, for example, when natural combinations of the five antigen binding 
loops, HI, H2, LI, L2 and L3, are considered, the most frequent combination of canonical 
structures is determined and then combined with the most popular conformation for the 
H3 loop, as a basis for choosing the single main-chain conformation. 

10 b. Diversification of the canonical sequence 

Having selected several known main-chain conformations or, preferably a single 
known main-chain conformation, the library of the invention is constructed by varying the 
binding site of the molecule in order to generate a repertoire with structural and/or 
functional diversity. This means that variants are generated such that they possess 

15 sufficient diversity in their structure and/or in their function so that they are capable of 
providing a range of activities. For example, where the polypeptides in question are cell- 
surface receptors, they may possess a diversity of target ligand binding specificities. 

The desired diversity is typically generated by varying the selected molecule at one or 
20 more positions. The positions to be changed can be chosen at random or are preferably 
selected. The variation can then be achieved either by randomization, during which the 
resident amino acid is replaced by any amino acid or analogue thereof, natural or 
synthetic, producing a very large number of variants or by replacing the resident amino 
acid with one or more of a defined subset of amino acids, producing a more limited 
25 number of variants. 

Various methods have been reported for introducing such diversity. Error-prone PCR 
(Hawkins et al (1992) J. Mol Biol, 226: 889), chemical mutagenesis (Deng et al. (1994) 
J. Biol Chem., 269: 9533) or bacterial mutator strains (Low et al. (1996) J. Mol. Biol, 

30 260: 359) can be used to introduce random mutations into the genes that encode the 
molecule. Methods for mutating selected positions are also well known in the art and 
include the use of mismatched oligonucleotides or degenerate oligonucleotides, with or 
without the use of PCR. For example, several synthetic antibody libraries have been 
created by targeting mutations to the antigen binding loops. The H3 region of a human 

35 tetanus toxoid-binding Fab has been randomized to create a range of new binding 
specificities (Barbas et al. (1992) supra). Random or semi-random H3 and L3 regions 
have been appended to germline V gene segments to produce large libraries with 
unmutated framework regions (Hoogenboom and Winter (1992) supra; Nissim et al. 
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(1994) supra; Griffiths et al. (1994) supra; De Kruif et al (1995) supra). Such 
diversification has been extended to include some or all of the other antigen binding loops 
(Crameri et al (1996) Nature Med., 2: 100; Riechmann et al (1995) Bio/Technology, 13: 
475; Morphosys, WO97/08320, supra). 

5 

Since loop randomization has the potential to create approximately more than 10 15 
structures for H3 alone and a similarly large number of variants for the other five loops, it 
is not feasible using current transformation technology or even by using cell free systems 
to produce a library representing all possible combinations. For example, in one of the 
10 largest libraries constructed to date, 6 x 10 10 different antibodies, which is only a fraction 
of the potential diversity for a library of this design, were generated (Griffiths et al. (1994) 
supra). 

In addition to the removal of non-functional members and the use of a single known 
15 main-chain conformation, the present invention addresses these limitations by 
diversifying only those residues which are directly involved in creating or modifying the 
desired function of the molecule. For many molecules, the function will be to bind a target 
ligand and therefore diversity should be concentrated in the target ligand binding site, 
while avoiding changing residues which are crucial to the overall packing of the molecule 
20 or to maintaining the chosen main-chain conformation; therefore, the invention provides a 
library wherein the selected positions to be varied may be those that constitute the binding 
site for the target ligand. 

Diversification of the canonical sequence as it applies to antibodies 
25 In the case of an antibody library, the binding site for the target ligand is most 

often the antigen binding site. Thus, in a highly preferred aspect, the invention provides 
an antibody library in which only those residues in the antigen binding site are varied. 
These residues are extremely diverse in the human antibody repertoire and are known to 
make contacts in high-resolution antibody/antigen complexes. For example, in L2 it is 
30 known that positions 50 and 53 are diverse in naturally occurring antibodies and are 
observed to make contact with the antigen. In contrast, the conventional approach would 
have been to diversify all the residues in the corresponding Complementarity Determining 
Region (CDR1) as defined by Kabat et al. (1991, supra), some seven residues compared 
to the two diversified in the library according to the invention. This represents a 
35 significant improvement in terms of the functional diversity required to create a range of 
antigen binding specificities. 
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In nature, antibody diversity is the result of two processes: somatic recombination of 
germline V, D and J gene segments to create a naive primary repertoire (so called 
germline and junctional diversity) and somatic hypermutation of the resulting rearranged 
V genes. Analysis of human antibody sequences has shown that diversity in the primary 
5 repertoire is focused at the centre of the antigen binding site whereas somatic 
hypermutation spreads diversity to regions at the periphery of the antigen binding site that 
are highly conserved in the primary repertoire (see Tomlinson et al. (1996) supra). This 
complementarity has probably evolved as an efficient strategy for searching sequence 
space and, although apparently unique to antibodies, it can easily be applied to other 
10 polypeptide repertoires according to the invention. According to the invention, the 
residues which are varied are a subset of those that form the binding site for the target 
ligand. Different (including overlapping) subsets of residues in the target ligand binding 
site are diversified at different stages during selection, if desired. 

15 In the case of an antibody repertoire, the two-step process of the invention is analogous to 
the maturation of antibodies in the human immune system. An initial 'naive' repertoire is 
created where some, but not all, of the residues in the antigen binding site are diversified. 
As used herein in this context, the term "naive" refers to antibody molecules that have no 
pre-determined target ligand. These molecules resemble those which are encoded by the 

20 immunoglobulin genes of an individual who has not undergone immune diversification, 
as is the case with fetal and newborn individuals, whose immune systems have not yet 
been challenged by a wide variety of antigenic stimuli. This repertoire is then selected 
against a range of antigens. If required, further diversity can then be introduced outside 
the region diversified in the initial repertoire. This matured repertoire can be selected for 

25 modified function, specificity or affinity. 

The invention provides two different naive repertoires of antibodies in which some or all 
of the residues in the antigen binding site are varied. The "primary" library mimics the 
natural primary repertoire, with diversity restricted to residues at the centre of the antigen 

30 binding site that are diverse in the germline V gene segments (germline diversity) or 
diversified during the recombination process (junctional diversity). Those residues which 
are diversified include, but are not limited to, H50, H52, H52a, H53, H55, H56, H58, 
H95, H96, H97, H98, L50, L53, L91, L92, L93, L94 and L96. In the "somatic" library, 
diversity is restricted to residues that are diversified during the recombination process 

35 (junctional diversity) or are highly somatically mutated). Those residues which are 
diversified include, but are not limited to: H31, H33, H35, H95, H96, H97, H98, L30, 
L31, L32, L34 and L96. All the residues listed above as suitable for diversification in 
these libraries are known to make contacts in one or more antibody-antigen complexes. 
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Since in both libraries, not all of the residues in the antigen binding site are varied, 
additional diversity is incorporated during selection by varying the remaining residues, if 
it is desired to do so. It shall be apparent to one skilled in the art that any subset of any of 
these residues (or additional residues which comprise the antigen binding site) can be 
5 used for the initial and/or subsequent diversification of the antigen binding site. 

In the construction of libraries according to the invention, diversification of chosen 
positions is typically achieved at the nucleic acid level, by altering the coding sequence 
which specifies the sequence of the polypeptide such that a number of possible amino 
10 acids (all 20 or a subset thereof) can be incorporated at that position. Using the IUPAC 
nomenclature, the most versatile codon is NNK, which encodes all amino acids as well as 
the TAG stop codon. The NNK codon is preferably used in order to introduce the required 
diversity. Other codons which achieve the same ends are also of use, including the NNN 
codon, which leads to the production of the additional stop codons TGA and TAA. 

15 

A feature of side-chain diversity in the antigen binding site of human antibodies is a 
pronounced bias which favors certain amino acid residues. If the amino acid composition 
of the ten most diverse positions in each of the V H , V K and regions are summed, more 
than 76% of the side-chain diversity comes from only seven different residues, these 
20 being, serine (24%), tyrosine (14%), asparagine (11%), glycine (9%), alanine (7%), 
aspartate (6%) and threonine (6%). This bias towards hydrophilic residues and small 
residues which can provide main-chain flexibility probably reflects the evolution of 
surfaces which are predisposed to binding a wide range of antigens and may help to 
explain the required promiscuity of antibodies in the primary repertoire. 

25 

Since it is preferable to mimic this distribution of amino acids, the invention provides a 
library wherein the distribution of amino acids at the positions to be varied mimics that 
seen in the antigen binding site of antibodies. Such bias in the substitution of amino acids 
that permits selection of certain polypeptides (not just antibody polypeptides) against a 

30 range of target ligands is easily applied to any polypeptide repertoire according to the 
invention. There are various methods for biasing the amino acid distribution at the 
position to be varied (including the use of tri-nucleotide mutagenesis, WO97/08320, 
Morphosys, supra), of which the preferred method, due to ease of synthesis, is the use of 
conventional degenerate codons. By comparing the amino acid profile encoded by all 

35 combinations of degenerate codons (with single, double, triple and quadruple degeneracy 
in equal ratios at each position) with the natural amino acid use it is possible to calculate 
the most representative codon. The codons (AGT)(AGC)T, (AGT)(AGC)C and 
(AGT)(AGC)(CT) - that is, DVT, DVC and DVY, respectively using IUPAC 
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nomenclature - are those closest to the desired amino acid profile: they encode 22% serine 
and 11% tyrosine, asparagine, glycine, alanine, aspartate, threonine and cysteine. 
Preferably, therefore, libraries are constructed using either the DVT, DVC or DVY codon 
at each of the diversified positions. 

5 

As stated above, polypeptides which make up antibody libraries according to the 
invention may be whole antibodies or fragments thereof, such as Fab, F(ab')2, Fv or scFv 
fragments, or separate Vh or V L domains, any of which is either modified or unmodified. 
Of these, single-chain Fv fragments, or scFvs, are of particular use. ScFv fragments, as 

10 well as other antibody polypeptides, are reliably generated by antibody engineering 
methods well known in the art. The scFv is formed by connecting the Vh and Vl genes 
using an oligonucleotide that encodes an appropriately designed linker peptide, such as 
(Gly-Gly-Gly-Gly-Ser)3 or equivalent linker peptide(s). The linker bridges the C-terminal 
end of the first V region and N-terminal end of the second V region, ordered as either V H - 

15 linker- Vl or VL-linker-V H . In principle, the binding site of the scFv can faithfully 
reproduce the specificity of the corresponding whole antibody and vice- versa. 

Similar techniques for the construction of Fv, Fab and F(ab')2 fragments, as well as 
chimeric antibody molecules are well known in the art. When expressing Fv fragments, 

20 precautions should be taken to ensure correct chain folding and association. For Fab and 
F(ab ! )2 fragments, Vh and V L polypeptides are combined with constant region segments, 
which may be isolated from rearranged genes, germline C genes or synthesised from 
antibody sequence data as for V region segments. A library according to the invention 
may be a Vh or V L library. Thus, separate libraries comprising single Vh and Vl domains 

25 may be constructed and, optionally, include Ch or Cl domains, respectively, creating Dab 
molecules. 

c. Library vector systems according to the invention 

Libraries according to the invention can be used for direct screening using the 
30 generic and/or target ligands or used in a selection protocol that involves a genetic display 
package. 

Bacteriophage lambda expression systems may be screened directly as bacteriophage 
plaques or as colonies of lysogens, both as previously described (Huse et al. (1989,) 
35 Science, 246: 1275; Caton and Koprowski (1990) Proc. Natl. Acad. ScL U.S.A., 87; 
Mullinax et al. (1990) Proc. Natl. Acad. Sci. U.S.A., 87: 8095; Persson et al (1991) Proc. 
Natl. Acad. Sci. U.S.A., 88: 2432) and are of use in the invention. Whilst such expression 
systems can be used to screening up to 10 6 different members of a library, they are not 
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really suited to screening of larger numbers (greater than 10 6 members). Other screening 
systems rely, for example, on direct chemical synthesis of library members. One early 
method involves the synthesis of peptides on a set of pins or rods, such as described in 
WO84/03564. A similar method involving peptide synthesis on beads, which forms a 
5 peptide library in which each bead is an individual library member, is described in U.S. 
Patent No. 4,631,211 and a related method is described in WO92/00091. A significant 
improvement of the bead-based methods involves tagging each bead with a unique 
identifier tag, such as an oligonucleotide, so as to facilitate identification of the amino 
acid sequence of each library member. These improved bead-based methods are described 
10 inWO93/06121. 

Another chemical synthesis method involves the synthesis of arrays of peptides (or 
peptidomimetics) on a surface in a manner that places each distinct library member (e.g., 
unique peptide sequence) at a discrete, predefined location in the array. The identity of 

15 each library member is determined by its spatial location in the array. The locations in the 
array where binding interactions between a predetermined molecule (e.g., a receptor) and 
reactive library members occur is determined, thereby identifying the sequences of the 
reactive library members on the basis of spatial location. These methods are described in 
U.S. Patent No. 5,143,854; WO90/15070 and WO92/10092; Fodor et al (1991) Science, 

20 251: 767; Dower and Fodor (1991) Ann. Rep. Med. Chem. t 26: 271. 

Of particular use in the construction of libraries of the invention are selection display 
systems, which enable a nucleic acid to be linked to the polypeptide it expresses. As used 
herein, a selection display system is a system that permits the selection, by suitable 
25 display means, of the individual members of the library by binding the generic and/or 
target ligands. 

Any selection display system may be used in conjunction with a library according to the 
invention. Selection protocols for isolating desired members of large libraries are known 

30 in the art, as typified by phage display techniques. Such systems, in which diverse peptide 
sequences are displayed on the surface of filamentous bacteriophage (Scott and Smith 
(1990) supra), have proven useful for creating libraries of antibody fragments (and the 
nucleotide sequences that encoding them) for the in vitro selection and amplification of 
specific antibody fragments that bind a target antigen. The nucleotide sequences encoding 

35 the Vh and V L regions are linked to gene fragments which encode leader signals that 
direct them to the periplasmic space of E. coli and as a result the resultant antibody 
fragments are displayed on the surface of the bacteriophage, typically as fusions to 
bacteriophage coat proteins (e.g., pin or pVTII). Alternatively, antibody fragments are 
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displayed externally on lambda phage capsids (phagebodies). An advantage of phage- 
based display systems is that, because they are biological systems, selected library 
members can be amplified simply by growing the phage containing the selected library 
member in bacterial cells. Furthermore, since the nucleotide sequence that encode the 
5 polypeptide library member is contained on a phage or phagemid vector, sequencing, 
expression and subsequent genetic manipulation is relatively straightforward. 

Methods for the construction of bacteriophage antibody display libraries and lambda 
phage expression libraries are well known in the art (McCafferty et al (1990) supra; Kang 

10 et al. (1991) Proc. Natl. Acad. Sci. U.S.A., 88: 4363; Clackson et al. (1991) Nature, 352: 
624; Lowman et al. (1991) Biochemistry, 30: 10832; Burton et al (1991) Proc, Natl 
Acad. Sci U.S.A., 88: 10134; Hoogenboom et al. (1991) Nucleic Acids Res., 19: 4133; 
Chang et al (1991) J. Immunol, 147: 3610; Breitling et al (1991) Gene, 104: 147; Marks 
et al. (1991) supra; Barbas et al (1992) supra; Hawkins and Winter (1992) J. Immunol, 

15 22: 867; Marks et al, 1992, J. Biol Chem., 267: 16007; Lerner et al (1992) Science, 258: 
1313, incorporated herein by reference). 

One particularly advantageous approach has been the use of scFv phage-libraries (Huston 
et al, 1988, Proc. Natl. Acad. Sci U.S.A., 85: 5879-5883; Chaudhary et al (1990) Proc. 

20 Natl. Acad. Sci U.S.A., 87: 1066-1070; McCafferty et al (1990) supra; Clackson et al 
(1991) supra; Marks et al (1991) supra; Chiswell et al (1992) Trends Biotech., 10: 80; 
Marks et al (1992) supra). Various embodiments of scFv libraries displayed on 
bacteriophage coat proteins have been described. Refinements of phage display 
approaches are also known, for example as described in WO96/06213 and WO92/01047 

25 (Medical Research Council et al) and WO97/08320 (Morphosys, supra), which are 
incorporated herein by reference. 

Other systems for generating libraries of polypeptides or nucleotides involve the use of 
cell-free enzymatic machinery for the in vitro synthesis of the library members. In one 

30 method, RNA molecules are selected by alternate rounds of selection against a target 
ligand and PCR amplification (Tuerk and Gold (1990) Science, 249: 505; Ellington and 
Szostak (1990) Nature, 346: 818). A similar technique may be used to identify DNA 
sequences which bind a predetermined human transcription factor (Thiesen and Bach 
(1990) Nucleic Acids Res., 18: 3203; Beaudry and Joyce (1992) Science, 257: 635; 

35 WO92/05258 and W092/14843). In a similar way, in vitro translation can be used to 
synthesise polypeptides as a method for generating large libraries. These methods which 
generally comprise stabilised polysome complexes, are described further in WO88/08453, 
WO90/05785, WO90/07003, WO91/02076, WO91/05058, and WO92/02536. Alternative 
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display systems which are not phage-based, such as those disclosed in W095/22625 and 
W095/11922 (Affymax) use the polysomes to display polypeptides for selection. These 
and all the foregoing documents also are incorporated herein by reference. 

5 The invention accordingly provides a method for selecting a polypeptide having a desired 
generic and/or target ligand binding site from a repertoire of polypeptides, comprising the 
steps of: 

a) expressing a library according to the preceding aspects of the invention; 

b) contacting the polypeptides with the generic and/or target ligand and selecting 
10 those which bind the generic and/or target ligand; and 

c) optionally amplifying the selected polypeptide(s) which bind the generic and/or 
target ligand. 

d) optionally repeating steps a) - c). 

15 Preferably, steps a)-d) are performed using a phage display system. 

Since the invention provides a library of polypeptides which have binding sites for both 
generic and target ligands the above selection method can be applied to a selection using 
either the generic ligand or the target ligand. Thus, the initial library can be selected using 
20 the generic ligand and then the target ligand or using the target ligand and then the generic 
ligand. The invention also provides for multiple selections using different generic ligands 
either in parallel or in series before or after selection with the target ligand. 

Preferably, the method according to the invention further comprises the steps of 
25 subjecting the selected polypeptide(s) to additional variation (as described herein) and 
repeating steps a) to d). 

Since the generic ligand, by its very nature, is able to bind all library members selected 
using the generic ligand, the method according to the invention further comprises the use 
30 of the generic ligand (or some conjugate thereof) to detect, immobilise, purify or 
immunoprecipitate any functional member or population of members from the library 
(whether selected by binding the target ligand or not). 

Since the invention provides a library in which the members have a known main-chain 
35 conformation the method according to the invention further comprises the production of a 
three-dimensional structural model of any functional member of the library (whether 
selected by binding the target ligand or not). Preferably, the building of such a model 
involves homology modelling and/or molecular replacement. A preliminary model of the 
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main-chain conformation can be created by comparison of the polypeptide sequence to the 
sequence of a known three-dimensional structure, by secondary structure prediction or by 
screening structural libraries. Computational software may also be used to predict the 
secondary structure of the polypeptide. In order to predict the conformations of the side- 
5 chains at the varied positions, a side-chain rotamer library may be employed. 

In general, the nucleic acid molecules and vector constructs required for the performance 
of the present invention are available in the art and may be constructed and manipulated 
as set forth in standard laboratory manuals, such as Sambrook et al. (1989) Molecular 
10 Cloning: A Laboratory Manual, Cold Spring Harbor, USA. 

The manipulation of nucleic acids in the present invention is typically carried out in 
recombinant vectors. As used herein, vector refers to a discrete element that is used to 
introduce heterologous DNA into cells for the expression and/or replication thereof. 

15 Methods by which to select or construct and, subsequently, use such vectors are well 
known to one of moderate skill in the art. Numerous vectors are publicly available, 
including bacterial plasmids, bacteriophage, artificial chromosomes and episomal vectors. 
Such vectors may be used for simple cloning and mutagenesis; alternatively, as is typical 
of vectors in which repertoire (or pre-repertoire) members of the invention are carried, a 

20 gene expression vector is employed. A vector of use according to the invention may be 
selected to accommodate a polypeptide coding sequence of a desired size, typically from 
0.25 kilobase (kb) to 40 kb in length. A suitable host cell is transformed with the vector 
after in vitro cloning manipulations. Each vector contains various functional components, 
which generally include a cloning (or "polylinker") site, an origin of replication and at 

25 least one selectable marker gene. If given vector is an expression vector, it additionally 
possesses one or more of the following: enhancer element, promoter, transcription 
termination and signal sequences, each positioned in the vicinity of the cloning site, such 
that they are operatively linked to the gene encoding a polypeptide repertoire member 
according to the invention. 

30 

Both cloning and expression vectors generally contain nucleic acid sequences that enable 
the vector to replicate in one or more selected host cells. Typically in cloning vectors, this 
sequence is one that enables the vector to replicate independently of the host 
chromosomal DNA and includes origins of replication or autonomously replicating 
35 sequences. Such sequences are well known for a variety of bacteria, yeast and viruses. 
The origin of replication from the plasmid pBR322 is suitable for most Gram-negative 
bacteria, the 2 micron plasmid origin is suitable for yeast, and various viral origins (e.g. 
SV 40, adenovirus) are useful for cloning vectors in mammalian cells. Generally, the 
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origin of replication is not needed for mammalian expression vectors unless these are used 
in mammalian cells able to replicate high levels of DNA, such as COS cells. 

Advantageously, a cloning or expression vector may contain a selection gene also referred 
5 to as selectable marker. This gene encodes a protein necessary for the survival or growth 
of transformed host cells grown in a selective culture medium. Host cells not transformed 
with the vector containing the selection gene will therefore not survive in the culture 
medium. Typical selection genes encode proteins that confer resistance to antibiotics and 
other toxins, e.g. ampicillin, neomycin, methotrexate or tetracycline, complement 
1 0 auxotrophic deficiencies, or supply critical nutrients not available in the growth media. 

Since the replication of vectors according to the present invention is most conveniently 
performed in E. coli, an E. co/z-selectable marker, for example, the P-lactamase gene that 
confers resistance to the antibiotic ampicillin, is of use. These can be obtained from E. 
15 coli plasmids, such as pBR322 or a pUC plasmid such as pUC18 or pUC19. 

Expression vectors usually contain a promoter that is recognised by the host organism and 
is operably linked to the coding sequence of interest. Such a promoter may be inducible or 
constitutive. The term "operably linked" refers to a juxtaposition wherein the components 
20 described are in a relationship permitting them to function in their intended manner. A 
control sequence "operably linked" to a coding sequence is ligated in such a way that 
expression of the coding sequence is achieved under conditions compatible with the 
control sequences. 

25 Promoters suitable for use with prokaryotic hosts include, for example, the P-lactamase 
and lactose promoter systems, alkaline phosphatase, the tryptophan (trp) promoter system 
and hybrid promoters such as the tac promoter. Promoters for use in bacterial systems will 
also generally contain a Shine-Dalgarno sequence operably linked to the coding sequence. 

30 In the library according to the present invention, the preferred vectors are expression 
vectors that enables the expression of a nucleotide sequence corresponding to a 
polypeptide library member. Thus, selection with the generic and/or target ligands can be 
performed by separate propagation and expression of a single clone expressing the 
polypeptide library member or by use of any selection display system. As described 

35 above, the preferred selection display system is bacteriophage display. Thus, phage or 
phagemid vectors may be used. The preferred vectors are phagemid vectors which have an 
E. coli. origin of replication (for double stranded replication) and also a phage origin of 
replication (for production of single-stranded DNA). The manipulation and expression of 
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such vectors is well known in the art (Hoogenboom and Winter (1992) supra; Nissim et 
al (1994) supra). Briefly, the vector contains a (5-lactamase gene to confer selectivity on 
the phagemid and a lac promoter upstream of a expression cassette that consists (N to C 
terminal) of a pelB leader sequence (which directs the expressed polypeptide to the 
5 periplasmic space), a multiple cloning site (for cloning the nucleotide version of the 
library member), optionally, one or more peptide tag (for detection), optionally, one or 
more TAG stop codon and the phage protein pHL Thus, using various suppressor and 
non-suppressor strains of E. coli and with the addition of glucose, iso-propyl thio-p-D- 
galactoside (IPTG) or a helper phage, such as VCS Ml 3, the vector is able to replicate as 
10 a plasmid with no expression, produce large quantities of the polypeptide library member 
only or produce phage, some of which contain at least one copy of the polypeptide-pHI 
fusion on their surface. 

Construction of vectors according to the invention employs conventional ligation 
techniques. Isolated vectors or DNA fragments are cleaved, tailored, and religated in the 
form desired to generate the required vector. If desired, analysis to confirm that the correct 
sequences are present in the constructed vector can be performed in a known fashion. 
Suitable methods for constructing expression vectors, preparing in vitro transcripts, 
introducing DNA into host cells, and performing analyses for assessing expression and 
function are known to those skilled in the art. The presence of a gene sequence in a 
sample is detected, or its amplification and/or expression quantified by conventional 
methods, such as Southern or Northern analysis, Western blotting, dot blotting of DNA, 
RNA or protein, in situ hybridization, immunocytochemistry or sequence analysis of 
nucleic acid or protein molecules. Those skilled in the art will readily envisage how these 
methods may be modified, if desired. 

Mutagenesis using the polymerase chain reaction (PCR) 

Once a vector system is chosen and one or more nucleic acid sequences encoding 
polypeptides of interest are cloned into the library vector, one may generate diversity 
30 within the cloned molecules by undertaking mutagenesis prior to expression; 
alternatively, the encoded proteins may be expressed and selected, as described above, 
before mutagenesis and additional rounds of selection are performed. As stated above, 
mutagenesis of nucleic acid sequences encoding structurally optimized polypeptides, is 
carried out by standard molecular methods. Of particular use is the polymerase chain 
35 reaction, or PCR, (Mullis and Faloona (1987) Methods EnzymoL, 155: 335, herein 
incorporated by reference). PCR, which uses multiple cycles of DNA replication 
catalyzed by a thermostable, DNA-dependent DNA polymerase to amplify the target 
sequence of interest, is well known in the art. 
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Oligonucleotide primers useful according to the invention are single-stranded DNA or 
RNA molecules that hybridize to a nucleic acid template to prime enzymatic synthesis of 
a second nucleic acid strand. The primer is complementary to a portion of a target 
5 molecule present in a pool of nucleic acid molecules used in the preparation of sets of 
arrays of the invention. It is contemplated that such a molecule is prepared by synthetic 
methods, either chemical or enzymatic. Alternatively, such a molecule or a fragment 
thereof is naturally occurring, and is isolated from its natural source or purchased from a 
commercial supplier. Mutagenic oligonucleotide primers are 15 to 100 nucleotides in 
10 length, ideally from 20 to 40 nucleotides, although oligonucleotides of different length are 
of use. 

Typically, selective hybridization occurs when two nucleic acid sequences are 
substantially complementary (at least about 65% complementary over a stretch of at least 

15 14 to 25 nucleotides, preferably at least about 75%, more preferably at least about 90% 
complementary). See Kanehisa (1984) Nucleic Acids Res. 12: 203, incorporated herein by 
reference. As a result, it is expected that a certain degree of mismatch at the priming site 
is tolerated. Such mismatch may be small, such as a mono-, di- or tri-nucleotide. 
Alternatively, it may comprise nucleotide loops, which we define as regions in which 

20 mismatch encompasses an uninterrupted series of four or more nucleotides. 

Overall, five factors influence the efficiency and selectivity of hybridization of the primer 
to a second nucleic acid molecule. These factors, which are (i) primer length, (ii) the 
nucleotide sequence and/or composition, (iii) hybridization temperature, (iv) buffer 
25 chemistry and (v) the potential for steric hindrance in the region to which the primer is 
required to hybridize, are important considerations when non-random priming sequences 
are designed. 

There is a positive correlation between primer length and both the efficiency and accuracy 
30 with which a primer will anneal to a target sequence; longer sequences have a higher 
melting temperature (Tm) than do shorter ones, and are less likely to be repeated within a 
given target sequence, thereby minimizing promiscuous hybridization. Primer sequences 
with a high G-C content or that comprise palindromic sequences tend to self-hybridize, as 
do their intended target sites, since unimolecular, rather than bimolecular, hybridization 
35 kinetics are genererally favored in solution; at the same time, it is important to design a 
primer containing sufficient numbers of G-C nucleotide pairings to bind the target 
sequence tightly, since each such pair is bound by three hydrogen bonds, rather than the 
two that are found when A and T bases pair. Hybridization temperature varies inversely 
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with primer annealing efficiency, as does the concentration of organic solvents, e.g. 
formamide, that might be included in a hybridization mixture, while increases in salt 
concentration facilitate binding. Under stringent hybridization conditions, longer probes 
hybridize more efficiently than do shorter ones, which are sufficient under more 
5 permissive conditions. Stringent hybridization conditions typically include salt 
concentrations of less than about 1M, more usually less than about 500 mM and 
preferably less than about 200 mM. Hybridization temperatures range from as low as 0°C 
to greater than 22°C, greater than about 30°C, and (most often) in excess of about 37°C. 
Longer fragments may require higher hybridization temperatures for specific 
10 hybridization. As several factors affect the stringency of hybridization, the combination of 
parameters is more important than the absolute measure of any one alone. 

Primers are designed with these considerations in mind. While estimates of the relative 
merits of numerous sequences may be made mentally by one of skill in the art, computer 

15 programs have been designed to assist in the evaluation of these several parameters and 
the optimization of primer sequences. Examples of such programs are "PrimerSelect" of 
the DNAStar™ software package (DNAStar, Inc.; Madison, WI) and OLIGO 4.0 
(National Biosciences, Inc.). Once designed, suitable oligonucleotides are prepared by a 
suitable method, e.g. the phosphoramidite method described by Beaucage and Carruthers 

20 (1981) Tetrahedron Lett., 22: 1859) or the triester method according to Matteucci and 
Caruthers (1981) J. Am. Chem. Soc, 103: 3185, both incorporated herein by reference, or 
by other chemical methods using either a commercial automated oligonucleotide 
synthesizer or VLSEPS™ technology. 

25 PGR is performed using template DNA (at least lfg; more usefully, 1-1000 ng) and at 
least 25 pmol of oligonucleotide primers; it may be advantageous to use a larger amount 
of primer when the primer pool is heavily heterogeneous, as each sequence is represented 
by only a small fraction of the molecules of the pool, and amounts become limiting in the 
later amplification cycles. A typical reaction mixture includes: 2|il of DNA, 25 pmol of 

30 oligonucleotide primer, 2.5 fal of 10X PGR buffer 1 (Perkin-Elmer, Foster City, CA), 0.4 
\x\ of 1 .25 |iM dNTP, 0.15 ^il (or 2.5 units) of Taq DNA polymerase (Perkin Elmer, Foster 
City, CA) and deionized water to a total volume of 25 \xl. Mineral oil is overlaid and the 
PCR is performed using a programmable thermal cycler. 

35 The length and temperature of each step of a PCR cycle, as well as the number of cycles, 
is adjusted in accordance to the stringency requirements in effect. Annealing temperature 
and timing are determined both by the efficiency with which a primer is expected to 
anneal to a template and the degree of mismatch that is to be tolerated; obviously, when 
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nucleic acid molecules are simultaneously amplified and mutagenized, mismatch is 
required, at least in the first round of synthesis. In attempting to amplify a population of 
molecules using a mixed pool of mutagenic primers, the loss, under stringent (high- 
temperature) annealing conditions, of potential mutant products that would only result 
5 from low melting temperatures is weighed against the promiscuous annealing of primers 
to sequences other than the target site. The ability to optimize the stringency of primer 
annealing conditions is well within the knowledge of one of moderate skill in the art. An 
annealing temperature of between 30 C and 72 °C is used. Initial denaturation of the 
template molecules normally occurs at between 92°C and 99°C for 4 minutes, followed by 
10 20-40 cycles consisting of denaturation (94-99°C for 15 seconds to 1 minute), annealing 
(temperature determined as discussed above; 1-2 minutes), and extension (72°C for 1-5 
minutes, depending on the length of the amplified product). Final extension is generally 
for 4 minutes at 72°C, and may be followed by an indefinite (0-24 hour) step at 4°C. 

Structural analysis of repertoire members 

Since the invention provides a repertoire of polypeptides of known main-chain 
conformation, a three-dimensional structural model of any member of the repertoire is 
easily generated. Typically, the building of such a model involves homology modelling 
and/or molecular replacement. A preliminary model of the main-chain conformation is 
created by comparison of the polypeptide sequence to a similar sequence of known three- 
dimensional structure, by secondary structure prediction or by screening structural 
libraries. Molecular modelling computer software packages are commercially available, 
and are useful in predicting polypeptide secondary structures. In order to predict the 
conformations of the side-chains at the varied positions, a side-chain rotamer library may 
be employed. 

Antibodies for use as ligands in polypeptide selection 

A generic or target ligand to be used in the polypeptide selection according to the 
present invention may, itself, be an antibody. This is particularly true of generic ligands, 
30 which bind to structural ^ features that are substantially conserved in functional 
polypeptides to be selected for inclusion in repertoires of the invention. If an appropriate 
antibody is not publicly available, it may be produced by phage display methodology (see 
above) or as follows: 

Either recombinant proteins or those derived from natural sources can be used to 
35 generate antibodies using standard techniques, well known to those in the field. For 
example, the protein (or "immunogen") is administered to challenge a mammal such as a 
monkey, goat, rabbit or mouse. The resulting antibodies can be collected as polyclonal 
sera, or antibody-producing cells from the challenged animal can be immortalized (e.g. by 
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fusion with an immortalizing fusion partner to produce a hybridoma), which cells then 
produce monoclonal antibodies. 

a. Polyclonal antibodies 

5 The antigen protein is either used alone or conjugated to a conventional carrier in 

order to increases its immunogenicity, and an antiserum to the peptide-carrier conjugate is 
raised in an animal, as described above. Coupling of a peptide to a carrier protein and 
immunizations may be performed as described (Dymecki et aL (1992) J, Biol. Chem., 
267: 4815). The serum is titered against protein antigen by ELISA or alternatively by dot 
10 or spot blotting (Boersma and Van Leeuwen (1994) J. Neurosci. Methods, 51: 317). The 
serum is shown to react strongly with the appropriate peptides by ELISA, for example, 
following the procedures of Green et al. (1982) Cell, 28: 477. 

b. Monoclonal antibodies 

15 Techniques for preparing monoclonal antibodies are well known, and monoclonal 

antibodies may be prepared using any candidate antigen, preferably bound to a carrier, as 
described by Arnheiter et aL (1981) Nature, 294, 278. Monoclonal antibodies are 
typically obtained from hybridoma tissue cultures or from ascites fluid obtained from 
animals into which the hybridoma tissue was introduced. Nevertheless, monoclonal 

20 antibodies may be described as being "raised against" or "induced by" a protein. 

After being raised, monoclonal antibodies are tested for function and specificity by any of 
a number of means. Similar procedures can also be used to test recombinant antibodies 
produced by phage display or other in vitro selection technologies. Monoclonal antibody- 

25 producing hybridomas (or polyclonal sera) can be screened for antibody binding to the 
immunogen, as well. Particularly preferred immunological tests include enzyme-linked 
immunoassays (ELISA), immunoblotting and immunoprecipitation (see Voller, (1978) 
Diagnostic Horizons, 2: 1, Microbiological Associates Quarterly Publication, 
Walkersville, MD; Voller et aL (1978) J. Clin. PathoL, 31: 507; U.S. Reissue Pat. No. 

30 31,006; UK Patent 2,019,408; Butler (1981) Methods EnzymoL, 73: 482; Maggio, E. 
(ed.), (1980) Enzyme Immunoassay, CRC Press, Boca Raton, FL) or radioimmunoassays 
(RIA) (Weintraub, B., Principles of radioimmunoassays, Seventh Training Course on 
Radioligand Assay Techniques, The Endocrine Society, March 1986, pp. 1-5, 46-49 and 
68-78), all to detect binding of the antibody to the immunogen against which it was 

35 raised. It will be apparent to one skilled in the art that either the antibody molecule or the 
immunogen must be labeled to facilitate such detection. Techniques for labeling antibody 
molecules are well known to those skilled in the art (see Harlour and Lane (1989) 
Antibodies, Cold Spring Harbor Laboratory, pp. 1-726). 
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Alternatively, other techniques can be used to detect binding to the immunogen, thereby 
confirming the integrity of the antibody which is to serve either as a generic antigen or a 
target antigen according to the invention. These include chromatographic methods such as 
5 SDS PAGE, isoelectric focusing, Western blotting, HPLC and capillary electrophoresis. 

"Antibodies" are defined herein as constructions using the binding (variable) region of 
such antibodies, and other antibody modifications. Thus, an antibody useful in the 
invention may comprise whole antibodies, antibody fragments, polyfiinctional antibody 

10 aggregates, or in general any substance comprising one or more specific binding sites 
from an antibody. The antibody fragments may be fragments such as Fv, Fab and F(ab f ) 2 
fragments or any derivatives thereof, such as a single chain Fv fragments. The antibodies 
or antibody fragments may be non-recombinant, recombinant or humanized. The antibody 
may be of any immunoglobulin isotype, e.g., IgG, IgM, and so forth. In addition, 

15 aggregates, polymers, derivatives and conjugates of immunoglobulins or their fragments 
can be used where appropriate. 

The invention is further described, for the purposes of illustration only, in the following 
examples. 

20 

Metallic ions as ligands for the selection of polypeptides 

As stated above, ligands other than antibodies are of use in the selection of 
polypeptides according to the invention. One such category of ligand is that of metallic 
ions. For example, one may wish to preselect a repertoire for the presence of a functional 

25 histidine (HIS) tag using a Ni-NTA matrix. Immobilized metal affinity chromatography 
(MAC; Hubert and Porath (1980) /. Chromatography, 98: 247) takes advantage of the 
metal-binding properties of histidine and cysteine amino acid residues, as well as others 
that may bind metals, on the exposed surfaces of numerous proteins. It employs a resin, 
typically agarose, comprising a bidentate metal chelator (e.g. iminodiacetic acid, IDA, a 

30 dicarboxylic acid group) to which is complexed metallic ions; in order to generate a 
metallic-ion-bearing resin according to the invention, agarose/IDA is mixed with a metal 
salt (for example, Q1CI2 2H2O), from which the IDA chelates the divalent cations. One 
commercially available agarose/IDA preparation is "CHELATING SEPHAROSE 6B" 
(Pharmacia Fine Chemicals; Piscataway, NJ). Metallic ion that are of use include, but are 

35 not limited to, the divalent cations Ni 2+ , Cu 2+ , Zn 2+ and Co 2+ . A pool of polypeptide 
molecules is prepared in a binding buffer which consists essentially of salt (typically, 
NaCl or KC1) at a 0.1- to 1.0M concentration and a weak ligand (such as Tris or 
ammonia), the latter of which has affinity for the metallic ions of the resin, but to a lesser 
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degree than does a polypeptide to be selected according to the invention. Useful 
concentrations of the weak ligand range from 0.01- to 0.1 M in the binding buffer. 

The polypeptide pool is contacted with the resin under conditions which permit 
5 polypeptides having metal-binding domains (see below) to bind; after impurities are 
washed away, the selected polypeptides are eluted with a buffer in which the weak ligand 
is present in a higher concentration than in the binding buffer, specifically, at a 
concentration sufficient for the weak ligand to displace the selected polypeptides, despite 
its lower binding affinity for the metallic ions. Useful concentrations of the weak ligand in 
10 the elution buffer are 10- to 50-fold higher than in the binding buffer, typically from 0.1 to 
0.3 M; note that the concentration of salt in the elution buffer equals that in the binding 
buffer. According to the methods of the present invention, the metallic ions of the resin 
typically serve as the generic ligand; however, it is contemplated that they may also be 
used as the target ligand. 

15 

IM AC is carried out using a standard chromatography apparatus (columns, through which 
buffer is drawn by gravity, pulled by a vacuum or driven by pressure); alternatively, a 
large-batch procedure is employed, in which the metal-bearing resin is mixed, in slurry 
form, with the polypeptide pool from which members of a repertoire of the invention are 
20 to be selected. 

Partial purification of a serum T4 protein by IMAC has been described (Staples et aL y 
U.S. Patent No. 5,169,936); however, the broad spectrum of proteins comprising surface- 
exposed metal-binding domains also encompasses other soluble T4 proteins, human 

25 serum proteins (e.g. IgG, haptoglobin, hemopexin, Gc-globulin, Clq, C3, C4), human 
desmoplasmin, Dolichos biflorus lectin, zinc-inhibited Tyr(P) phosphatases, phenolase, 
carboxypeptidase isoenzymes, Cu,Zn superoxide dismutases (including those of humans 
and all other eukaryotes), nucleoside diphosphatase, leukocyte interferon, lactoferrin, 
human plasma a 2 -SH glycoprotein, p 2 -macroglobulin, oti -antitrypsin, plasminogen 

30 activator, gastrointestinal polypeptides, pepsin, human and bovine serum albumin, granule 
proteins from granulocytes, lysozymes, non-histone proteins, human fibrinogen, human 
serum transferrin, human lymphotoxin, calmodulin, protein A, avidin, myoglobins, 
somatomedins, human growth hormone, transforming growth factors, platelet-derived 
growth factor, ct-human atrial natriuretic polypeptide, cardiodilatin and others. In 

35 addition, extracellular domain sequences of membrane-bound proteins may be purified 
using IMAC. Note that repertoires comprising any of the above proteins or metal-binding 
variants thereof may be produced according to the methods of the invention. 
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Following elution, selected polypeptides are removed from the metal binding buffer and 
placed in a buffer appropriate to their next use. If the metallic ion has been used to 
generate a first selected polypeptide pool according to the invention, the molecules of that 
pool are placed into a buffer that is optimized for binding with the second ligand to be 
5 used in selection of the members of the functional polypeptide repertoire. If the metal is, 
instead, used in the second selection step, the polypeptides of the repertoire are transferred 
to a buffer suitable either to storage (e.g. a 0.5% glycine buffer) or the use for which they 
are intended. Such buffers include, but are not limited to: water, organic solvents, 
mixtures of water and water-miscible organic solvents, physiological salt buffers and 
10 protein/nucleic acid or protein/protein binding buffers. Alternatively, the polypeptide 
molecules may be dehydrated (i.e. by lyophilization) or immobilized on a solid or semi- 
solid support, such as a nitrocellulose or nylon filtration membrane or a gel matrix (i.e. of 
agarose or polyacrylamide) or crosslinked to a chromatography resin. 

1 5 Polypeptide molecules may be removed from the elution buffer by any of a number of 
methods known in the art. The polypeptide eluate may be dialyzed against water or 
another solution of choice; if the polypeptides are to be lyophilized, water to which has 
been added protease inhibitors (e.g. pepstatin, aprotinin, leupeptin, or others) is used. 
Alternatively, the sample may be subjected to ammonium sulfate precipitation, which is 

20 well known in the art, prior to resuspension in the medium of choice. 

Use of polypeptides selected according to the invention 

Polypeptides selected according to the method of the present invention may be 
employed in substantially any process which involves ligand-polypeptide binding, 
25 including in vivo therapeutic and prophylactic applications, in vitro and in vivo diagnostic 
applications, in vitro assay and reagent applications, and the like. For example, in the case 
of antibodies, antibody molecules may be used in antibody based assay techniques, such 
as ELISA techniques, according to methods known to those skilled in the art. 

30 As alluded to above, the molecules selected according to the invention are of use in 
diagnostic, prophylactic and therapeutic procedures. For example, enzyme variants 
generated and selected by these methods may be assayed for activity, either in vitro or in 
vivo using techniques well known in the art, by which they are incubated with candidate 
substrate molecules and the conversion of substrate to product is analyzed. Selected cell- 

35 surface receptors or adhesion molecules might be expressed in cultured cells which are 
then tested for their ability to respond to biochemical stimuli or for their affinity with 
other cell types that express cell-surface molecules to which the undiversified adhesion 
molecule would be expected to bind, respectively. Antibody polypeptides selected 
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according to the invention are of use diagnostically in Western analysis and in situ protein 
detection by standard immunohistochemical procedures; for use in these applications, the 
antibodies of a selected repertoire may be labelled in accordance with techniques known 
to the art. In addition, such antibody polypeptides may be used preparatively in affinity 
5 chromatography procedures, when complexed to a chromatographic support, such as a 
resin. All such techniques are well known to one of skill in the art. 

Therapeutic and prophylactic uses of proteins prepared according to the invention involve 
the administration of polypeptides selected according to the invention to a recipient 
10 mammal, such as a human. Of particular use in this regard are antibodies, other receptors 
(including, but not limited to T-cell receptors) and in the case in which an antibody or 
receptor was used as either a generic or target ligand, proteins which bind to them. 

Substantially pure antibodies or binding proteins thereof of at least 90 to 95% 
15 homogeneity are preferred for administration to a mammal, and 98 to 99% or more 
homogeneity is most preferred for pharmaceutical uses, especially when the mammal is- a 
human. Once purified, partially or to homogeneity as desired, the selected polypeptides 
may be used diagnostically or therapeutically (including extracorporeally) or in 
developing and performing assay procedures, immunofluorescent stainings and the like 
20 (Lefkovite and Pernis, (1979 and 1981) Immunological Methods, Volumes I and n, 
Academic Press, NY). 

The selected antibodies or binding proteins thereof of the present invention will typically 
find use in preventing, suppressing or treating inflammatory states, allergic 
25 hypersensitivity, cancer, bacterial or viral infection, and autoimmune disorders (which 
include, but are not limited to, Type I diabetes, multiple sclerosis, rheumatoid arthritis, 
systemic lupus erythematosus, Crohn's disease and myasthenia gravis). 

In the instant application, the term "prevention" involves administration of the protective 
30 composition prior to the induction of the disease. "Suppression" refers to administration 
of the composition after an inductive event, but prior to the clinical appearance of the 
disease. "Treatment" involves administration of the protective composition after disease 
symptoms become manifest. 

35 Animal model systems which can be used to screen the effectiveness of the antibodies or 
binding proteins thereof in protecting against or treating the disease are available. 
Methods for the testing of systemic lupus erythematosus (SLE) in susceptible mice are 
known in the art (Knight et al (1978) J, Exp. Med., 147: 1653; Reinersten et al (1978) 
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New Eng. J. Med., 299: 515). Myasthenia Gravis (MG) is tested in SJL/J female mice by 
inducing the disease with soluble AchR protein from another species (Lindstrom et al 
(1988) Adv. Immunol, 42: 233). Arthritis is induced in a susceptible strain of mice by 
injection of Type II collagen (Stuart et al (1984) Ann. Rev. Immunol., 42: 233). A model 
5 by which adjuvant arthritis is induced in susceptible rats by injection of mycobacterial 
heat shock protein has been described (Van Eden et al. (1988) Nature, 331: 171). 
Thyroiditis is induced in mice by administration of thyroglobulin as described (Maron et 
al (1980) J. Exp. Med., 152: 1115). Insulin dependent diabetes mellitus (BDDM) occurs 
naturally or can be induced in certain strains of mice such as those described by Kanasawa 
10 et al. (1984) Diabetologia, 27: 113. EAE in mouse and rat serves as a model for MS in 
human. In this model, the demyelinating disease is induced by administration of myelin 
basic protein (see Paterson (1986) Textbook of Immunopathology, Mischer et al, eds., 
Grune and Stratton, New York, pp. 179-213; McFarlin et al. (1973) Science, 179: 478; 
and Satoh et al (1987) J. Immunol, 138: 179). 

15 

The selected antibodies, receptors (including, but not limited to T-cell receptors) or 
binding proteins thereof of the present invention may also be used in combination with 
other antibodies, particularly monoclonal antibodes (MAbs) reactive with other markers 
on human cells responsible for the diseases. For example, suitable T-cell markers can 
20 include those grouped into the so-called "Clusters of Differentiation," as named by the 
First International Leukocyte Differentiation Workshop (Bernhard et al (1984) Leukocyte 
Typing, Springer Verlag, NY). 

Generally, the present selected antibodies, receptors or binding proteins will be utilized in 
25 purified form together with pharmacologically appropriate carriers. Typically, these 
carriers include aqueous or alcoholic/aqueous solutions, emulsions or suspensions, any 
including saline and/or buffered media. Parenteral vehicles include sodium chloride 
solution, Ringer's dextrose, dextrose and sodium chloride and lactated Ringer's. Suitable 
physiologically-acceptable adjuvants, if necessary to keep a polypeptide complex in 
30 suspension, may be chosen from thickeners such as carboxymethylcellulose, 
polyvinylpyrrolidone, gelatin and alginates. 

Intravenous vehicles include fluid and nutrient replenishers and electrolyte replenishers, 
such as those based on Ringer's dextrose. Preservatives and other additives, such as 
35 antimicrobials, antioxidants, chelating agents and inert gases, may also be present (Mack 
(1982) Remington's Pharmaceutical Sciences, 16th Edition). 
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The selected polypeptides of the present invention may be used as separately administered 
compositions or in conjunction with other agents. These can include various 
immunotherapeutic drugs, such as cylcosporine, methotrexate, adriamycin or cisplatinum, 
and immunotoxins. Pharmaceutical compositions can include "cocktails" of various 
5 cytotoxic or other agents in conjunction with the selected antibodies, receptors or binding 
proteins thereof of the present invention, or even combinations of selected polypeptides 
according to the present invention having different specificities, such as polypeptides 
selected using different target ligands, whether or not they are pooled prior to 
administration. 

10 

The route of administration of pharmaceutical compositions according to the invention 
may be any of those commonly known to those of ordinary skill in the art. For therapy, 
including without limitation immunotherapy, the selected antibodies, receptors or binding 
proteins thereof of the invention can be administered to any patient in accordance with 

15 standard techniques. The administration can be by any appropriate mode, including 
parenterally, intravenously, intramuscularly, intraperitoneally, transdermally, via the 
pulmonary route, or also, appropriately, by direct infusion with a catheter. The dosage and 
frequency of administration will depend on the age, sex and condition of the patient, 
concurrent administration of other drugs, counterindications and other parameters to be 

20 taken into account by the clinician. 

The selected polypeptides of this invention can be lyophilized for storage and 
reconstituted in a suitable carrier prior to use. This technique has been shown to be 
effective with conventional immunoglobulins and art-known lyophilization and 
25 reconstitution techniques can be employed. It will be appreciated by those skilled in the 
art that lyophilization and reconstitution can lead to varying degrees of antibody activity 
loss (e.g. with conventional immunoglobulins, IgM antibodies tend to have greater 
activity loss than IgG antibodies) and that use levels may have to be adjusted upward to 
compensate. 

30 

The compositions containing the present selected polypeptides or a cocktail thereof can be 
administered for prophylactic and/or therapeutic treatments. In certain therapeutic 
applications, an adequate amount to accomplish at least partial inhibition, suppression, 
modulation, killing, or some other measurable parameter, of a population of selected cells 
35 is defined as a "therapeutically-effective dose". Amounts needed to achieve this dosage 
will depend upon the severity of the disease and the general state of the patient's own 
immune system, but generally range from 0.005 to 5.0 mg of selected antibody, receptor 
(e.g. a T-cell receptor) or binding protein thereof per kilogram of body weight, with doses 
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of 0.05 to 2.0 mg/kg/dose being more commonly used. For prophylactic applications, 
compositions containing the present selected polypeptides or cocktails thereof may also be 
administered in similar or slightly lower dosages. 

5 A composition containing a selected polypeptide according to the present invention may 
be utilized in prophylactic and therapeutic settings to aid in the alteration, inactivation, 
killing or removal of a select target cell population in a mammal. In addition, the selected 
repertoires of polypeptides described herein may be used extracorporeally or in vitro 
selectively to kill, deplete or otherwise effectively remove a target cell population from a 
10 heterogeneous collection of cells. Blood from a mammal may be combined 
extracorporeally with the selected antibodies, cell-surface receptors or binding proteins 
thereof whereby the undesired cells are killed or otherwise removed from the blood for 
return to the mammal in accordance with standard techniques. 

15 The invention is further described, for the purposes of illustration only, in the following 
examples. 

Example 1 

Antibody library design 

20 

A. Main-chain conformation 

For five of the six antigen binding loops of human antibodies (LI, L2, L3, HI and H2) 
there are a limited number of main-chain conformations, or canonical structures ((Chothia 

25 et al (1992) J. Mol Biol, 227: 799; Tomlinson et al (1995) EMBO J., 14: 4628; 
Williams et al (1996) J. Mol Biol, 264: 220). The most popular main-chain 
conformation for each of these loops is used to provide a single known main-chain 
conformation according to the invention. These are: HI - CS 1 (79% of the expressed 
repertoire), H2 - CS 3 (46%), LI - CS 2 of V K (39%), L2 - CS 1 (100%), L3 - CS 1 of V K 

30 (36%). The H3 loop forms a limited number of main-chain conformations for short loop 
lengths (Martin et al (1996) J. Mol Biol, 263: 800; Shirai et al (1996) FEBS Letters, 
399: 1). Thus, where the H3 has a CDR3 length (as defined by Kabat et al (1991). 
Sequences of proteins of immunological interest, U.S. Department of Health and Human 
Services) of seven residues and has a lysine or arginine residue at position H94 and an 

35 aspartate residue at position HI 01 a salt-bridge is formed between these two residues and 
in most cases a single main-chain conformation is likely to be produced. There are at least 
16 human antibody sequences in the EMBL data library with the required H3 length and 
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key residues to form this conformation and at least two crystallographic structures in the 
protein data bank which can be used as a basis for antibody modelling (2cgr and Itet). 

In this case, the most frequently expressed germline gene segments which encode the 
5 desired loop lengths and key residues to produce the required combinations of canonical 
structures are the Vh segment 3-23 (DP-47), the Jh segment JH4b, the V K segment 
02/012 (DPK9) and the J K segment J K 1. These segments can therefore be used in 
combination as a basis to construct a library with the desired single main-chain 
conformation. The V K segment 02/012 (DPK9) is member of the V K 1 family and 
10 therefore will bind the superantigen Protein L. The Vh segment 3-23 (DP-47) is a 
member of the Vh3 family and therefore should bind the superantigen Protein A, which 
can then be used as a generic ligand. 

B. Selection of positions for variation 

15 

Analysis of human Vh and V K sequences indicates that the most diverse positions in the 
mature repertoire are those that make the most contacts with antigens (see Tomlinson et 
al, (1996) J. Mol Biol, 256: 813; Figure 1). These positions form the functional antigen 
binding site and are therefore selected for side-chain diversification (Figure 2). H54 is a 

20 key residue and points away from the antigen binding site in the chosen H2 canonical 
structure 3 (the diversity seen at this position is due to canonical structures 1, 2 and 4 
where H54 points into the binding site). In this case H55 (which points into the binding 
site) is diversified instead. The diversity at these positions is created either by germline or 
junctional diversity in the primary repertoire or by somatic hypermutation (Tomlinson et 

25 al, (1996) /. Mol Biol, 256: 813; Figure 1). Two different subsets of residues in the 
antigen binding site were therefore varied to create two different library formats. In the 
"primary" library the residues selected for variation are from H2, H3, L2 and L3 (diversity 
in these loops is mainly the result of germline or junctional diversity). The positions 
varied in this library are: H50, H52, H52a, H53, H55, H56, H58, H95, H96, H97, H98, 

30 L50, L53, L91, L92, L93, L94 and L96 (18 residues in total, Figure 2). In the "somatic" 
library the residues selected for variation are from HI, H3, LI and the end of L3 (diversity 
here is mainly the result of somatic hypermutation or junctional diversity). The positions 
varied in this library are: H31, H33, H35, H95, H96, H97, H98, L30, L31, L32, L34 and 
L96 (12 residues in total, Figure 2). 

35 
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C. Selection of amino acid use at the positions to be varied 

Side-chain diversity is introduced into the "primary" and "somatic" libraries by 
incorporating either the codon NNK (which encodes all 20 amino acids, including the 
5 TAG stop codon, but not the TGA and TAA stop codons) or the codon DVT (which 
encodes 22% serine and 11% tyrosine, asparagine, glycine, alanine, aspartate, threonine 
and cysteine and using single, double, triple and quadruple degeneracy in equal ratios at 
each position, most closely mimics the distribution of amino acid residues for in the 
antigen binding sites of natural human antibodies). 

10 

Example 2 

Library construction and selection with the generic ligands 

The "primary" and "somatic" libraries were assembled by PCR using the oligonucleotides 
15 listed in Table 1 and the germline V gene segments DPK9 (Cox et al (1994) Eur. J. 
Immunol, 24: 827) and DP-47 (Tomlinson et al (1992) J. Mol Biol, 227: 7768). Briefly, 
first round of amplification was performed using pairs of 5' (back) primers in conjunction 
with NNK or DVT 3' (forward) primers together with the corresponding germline V gene 
segment as template (see Table 1). This produces eight separate DNA fragments for each 
20 of the NNK and DVT libraries. A second round of amplification was then performed 
using the 5* (back) primers and the 3' (forward) primers shown in Table 1 together with 
two of the purified fragments from the first round of amplification. This produces four 
separate fragments for each of the NNK and DVT libraries (a "primary" Vpj fragment, 
5A; a "primary" V K fragment, 6A; a "somatic" Vh fragment, 5B; and a "somatic" V K 
25 fragment, 6B). 

Each of these fragments was cut and then ligated into pCLEANVH (for the Vh 
fragments) or pCLEANVK (for the V K fragments) which contain dummy Vh and V K 
domains, respectively in a version of pHENl which does not contain any TAG codons or 

30 peptide tags (Hoogenboom & Winter (1992) 1 Mol Biol, 227: 381). The ligations were 
then electroporated into the non-suppressor E. Coll strain HB2151. Phage from each of 
these libraries was produced and separately selected using immunotubes coated with 10 
|ig/ml of the generic ligands Protein A and Protein L for the Vh and V K libraries, 
respectively. DNA from E. Coll infected with selected phage was then prepared and cut 

35 so that the dummy V K inserts were replaced by the corresponding V K libraries. 
Electroporation of these libraries results in the following insert library sizes: 9.21 x 10 8 
("primary" NNK), 5.57 x 10 8 ("primary" DVT), 1.00 x 10 9 ("somatic" NNK) and 2.38 x 
10 8 ("somatic" DVT). As a control for pre-selection four additional libraries were created 
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but without selection with the generic ligands Protein A and Protein L: insert library sizes 
for these libraries were 1.29 x 10 9 ("primary 1 ' NNK), 2.40 x 10 8 ("primary" DVT), 1.16 x 
10 9 ("somatic" NNK) and 2.17 x 10 8 ("somatic" DVT). 

5 To verify the success of the pre-selection step, DNA from the selected and unselected 
"primary" NNK libraries was cloned into a pUC based expression vector and 
electroporated into HB2151. 96 clones were picked at random from each recloned library 
and induced for expression of soluble scFv fragments. Production of functional scFv is 
assayed by ELISA using Protein L to capture the scFv and then Protein A-HRP conjugate 

10 to detect binding. Only scFv which express functional Vh and V K domains (no frame- 
shifts, stop codons, folding or expression mutations) will give a signal using this assay. 
The number of functional antibodies in each library (ELISA signals above background) 
was 5% with the unselected "primary" NNK library and 75% with the selected version of 
the same (Figure 3). Sequencing of clones which were negative in the assay confirmed the 

1 5 presence of frame-shifts, stop codons, PCR mutations at critical framework residues and 
amino acids in the antigen binding site which must prevent folding and/or expression. 

Example 3 

Library selection against target ligands 

20 

The "primary" and "somatic" NNK libraries (without pre-selection) were separately 
selected using five antigens (bovine ubiquitin, rat BIP, bovine histone, NIP-BSA and hen 
egg lysozyme) coated on immunotubes at various concentrations. After 2-4 rounds of 
selection, highly specific antibodies were obtained to all antigens except hen egg 
25 lysozyme. Clones were selected at random for sequencing demonstrating a range of 
antibodies to each antigen (Figure 4). 

In the second phase, phage from the pre-selected NNK and DVT libraries were mixed 1 : 1 
to create a single "primary" library and a single "somatic" library. These libraries were 

30 then separately selected using seven antigens (FITC-BSA, human leptin, human 
thyroglobulin, BSA, hen egg lysozyme, mouse IgG and human IgG) coated on 
immunotubes at various concentrations. After 2-4 rounds of selection, highly specific 
antibodies were obtained to all the antigens, including hen egg lysozyme which failed to 
produce positives in the previous phase of selection using the libraries that had not been 

35 pre-selected using the generic ligands. Clones were selected at random for sequencing, 
demonstrating a range of different antibodies to each antigen (Figure 4). 
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Example 4 

Effect of pre-selection on scFv expression and production of phage bearing scFv 



To further verify the outcome of the pre-selection, DNA from the unselected and pre- 
5 selected "primary" DVT libraries is cloned into a pUC based expression vector and 
electroporated into HB2151, yielding 10 5 clones in both cases. 96 clones are picked at 
random from each recloned library and induced for expression of soluble scFv fragments. 
Production of functional scFv is again assayed using Protein L to capture the scFv 
followed by the use of Protein A-HRP to detect bound scFv. The percentage of functional 

10 antibodies in each library is 35.4% (unselected) and 84.4% (pre-selected) indicating a 2.4 
fold increase in the number of functional members as a result of pre-selection with Protein 
A and Protein L (the increase is less pronounced than with the equivalent NNK library 
since the DVT codon does not encode the TAG stop codon. In the unselected NNK 
library, the presence of a TAG stop codon in a non-suppressor strain such as HB2151 will 

15 lead to termination and hence prevent functional scFv expression. Pre-selection of the 
NNK library removes clones containing TAG stop codons to produce a library in which a 
high proportion of members express soluble scFv.) 

In order to assess the effect to pre-selection of the "primary" DVT library on total scFv 
20 expression, the recloned unselected and pre-selected libraries (each containing 1 0 5 clones 
in a pUC based expression vector) are induced for polyclonal expression of scFv 
fragments. The concentration of expressed scFv in the supernatant is then determined by 
incubating two fold dilutions (columns 1 -12 in Figure 5a) of the supernatants on Protein 
L coated ELISA plate, followed by detection with Protein A-HRP, ScFvs of known 
25 concentration are assayed in parallel to quantify the levels of scFv expression in the 
unselected and pre-selected DVT libraries. These are used to plot a standard curve (Figure 
5b) and from this the expression levels of the unselected and pre-selected "primary" DVT 
libraries are calculated as 12.9 p.g/ml and 67.1 |ag/ml respectively i.e. a 5.2 fold increase 
in expression due to pre-selection with Protein A and Protein L. 

30 

To assess the amount of phage bearing scFv, the unselected and pre-selected "primary" 
DVT libraries are grown and polyclonal phage is produced. Equal volumes of phage from 
the two libraries are run under denaturing conditions on a 4-12% Bis-Tris NuPAGE Gel 
with MES running buffer. The resulting gel is western blotted, probed using an anti-pin 
35 antibody and exposed to X-ray film (Figure 6). The lower band in each case corresponds 
to pin protein alone, whilst the higher band contains the pIH-scFv fusion protein. 
Quantification of the band intensities using the software package NIH image indicates that 
pre-selection results in an 11.8 fold increase in the amount of fusion protein present in the 
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phage. Indeed, 43% of the total pEQ in the pre-selected phage exists as pHI-scFv fusion, 
suggesting that most phage particles will have at least one scFv displayed on the surface. 

Hence, not only does pre-selection using generic ligands enable enrichment of functional 
5 members from a repertoire but it also leads to preferential selection of those members 
which are well expressed and (if required) are able to elicit a high level of display on the 
surface of phage without being cleaved by bacterial proteases. 
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